Archive 07.03.2024

Page 77 of 88
1 75 76 77 78 79 88

Choosing the Right Vector Embedding Model for Your Generative AI Use Case

In our previous post, we discussed considerations around choosing a vector database for our hypothetical retrieval augmented generation (RAG) use case. But when building a RAG application we often need to make another important decision: choose a vector embedding model, a critical component of many generative AI applications. 

A vector embedding model is responsible for the transformation of unstructured data (text, images, audio, video) into a vector of numbers that capture semantic similarity between data objects. Embedding models are widely used beyond RAG applications, including recommendation systems, search engines, databases, and other data processing systems. 

Understanding their purpose, internals, advantages, and disadvantages is crucial and that’s what we’ll cover today. While we’ll be discussing text embedding models only, models for other types of unstructured data work similarly.

What Is an Embedding Model?

Machine learning models don’t work with text directly, they require numbers as input. Since text is ubiquitous, over time, the ML community developed many solutions that handle the conversion from text to numbers. There are many approaches of varying complexity, but we’ll review just some of them.

A simple example is one-hot encoding: treat words of a text as categorical variables and map each word to a vector of 0s and single 1.

image1

Unfortunately, this embedding approach is not very practical, since it leads to a large number of unique categories and results in unmanageable dimensionality of output vectors in most practical cases. Also, one-hot encoding does not put similar vectors closer to one another in a vector space.

Embedding models were invented to tackle these issues. Just like one-hot encoding, they take text as input and return vectors of numbers as output, but they are more complex as they are taught with supervised tasks, often using a neural network. A supervised task can be, for example, predicting product review sentiment score. In this case, the resulting embedding model would place reviews of similar sentiment closer to each other in a vector space. The choice of a supervised task is critical to producing relevant embeddings when building an embedding model.

image2

image3
Word embeddings projected onto 2D axes

On the diagram above we can see word embeddings only, but we often need more than that since human language is more complex than just many words put together. Semantics, word order, and other linguistic parameters should all be taken into account, which means we need to take it to the next level – sentence embedding models

Sentence embeddings associate an input sentence with a vector of numbers, and, as expected, are way more complex internally since they have to capture more complex relationships.

image4

Thanks to progress in deep learning, all state-of-the-art embedding models are created with deep neural nets, since they better capture complex relationships inherent to a human language.

A good embedding model should: 

  • Be fast since often it is just a preprocessing step in a larger application
  • Return vectors of manageable dimensions
  • Return vectors that capture enough information about similarity to be practical

Let’s now quickly look into how most embedding models are organized internally.

Modern Neural Networks Architecture

As we just mentioned, all well-performing state-of-the-art embedding models are deep neural networks. 

This is an actively developing field and most top performing models are associated with some novel architecture improvement. Let’s briefly cover two very important architectures: BERT and GPT.

BERT (Bidirectional Encoder Representations from Transformers) was published in 2018 by researchers at Google and described the application of the bidirectional training of “transformer”, a popular attention model, to language modeling. Standard transformers include two separate mechanisms: an encoder for reading text input and a decoder that makes a prediction. 

BERT uses an encoder that reads the entire sentence of words at once which allows the model to learn the context of a word based on all of its surroundings, left and right unlike legacy approaches that looked at a text sequence from left to right or right to left. Before feeding word sequences into BERT, some words are replaced with [MASK] tokens and then the model attempts to predict the original value of the masked words, based on the context provided by the other, non-masked words in the sequence.  

Standard BERT does not perform very well in most benchmarks and BERT models require task-specific fine-tuning. But it is open-source, has been around since 2018, and has relatively modest system requirements (can be trained on a single medium-range GPU). As a result, it became very popular for many text-related tasks. It is fast, customizable, and small. For example, a very popular all-Mini-LM model is a modified version of BERT.

GPT (Generative Pre-Trained Transformer) by OpenAI is different. Unlike BERT, It is unidirectional, i.e. text is processed in one direction and uses a decoder from a transformer architecture that is suitable for predicting the next word in a sequence. These models are slower and produce very high dimensional embeddings, but they usually have many more parameters, do not require fine-tuning, and are more applicable to many tasks out of the box. GPT is not open source and is available as a paid API.

Context Length and Training Data

Another important parameter of an embedding model is context length. Context length is the number of tokens a model can remember when working with a text. A longer context window allows the model to understand more complex relationships within a wider body of text. As a result, models can provide outputs of higher quality, e.g. capture semantic similarity better.

To leverage a longer context, training data should include longer pieces of coherent text: books, articles, and so on. However, increasing context window length increases the complexity of a model and increases compute and memory requirements for training. 

There are methods that help manage resource requirements e.g. approximate attention, but they do this at a cost to quality. That’s another trade-off that affects quality and costs: larger context lengths capture more complex relationships of a human language, but require more resources.

Also, as always, the quality of training data is very important for all models. Embedding models are no exception. 

Semantic Search and Information Retrieval

Using embedding models for semantic search is a relatively new approach. For decades, people used other technologies: boolean models, latent semantic indexing (LSI), and various probabilistic models.

Some of these approaches work reasonably well for many existing use cases and are still widely used in the industry. 

One of the most popular traditional probabilistic models is BM25 (BM is “best matching”), a search relevance ranking function. It is used to estimate the relevance of a document to a search query and ranks documents based on the query terms from each indexed document. Only recently have embedding models started consistently outperforming it, but BM25 is still used a lot since it is simpler than using embedding models, it has lower computer requirements, and the results are explainable.

Benchmarks

Not every model type has a comprehensive evaluation approach that helps to choose an existing model. 

Fortunately, text embedding models have common benchmark suites such as:

The article “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models” proposed a reference set of benchmarks and datasets for information retrieval tasks. The original BEIR benchmark consists of a set of 19 datasets and methods for search quality evaluation. Methods include: question-answering, fact-checking, and entity retrieval. Now anyone who releases a text embedding model for information retrieval tasks can run the benchmark and see how their model ranks against the competition.

Massive Text Embedding Benchmarks include BEIR and other components that cover 58 datasets and 112 languages. The public leaderboard for MTEB results can be found here.

These benchmarks have been run on a lot of existing models and their leaderboards are very useful to make an informed choice about model selection.

Using Embedding Models in a Production Environment

Benchmark scores on standard tasks are very important, but they represent only one dimension.

When we use an embedding model for search, we run it twice:

  • When doing offline indexing of available data
  • When embedding a user query for a search request 

There are two important consequences of this. 

The first is that we have to reindex all existing data when we change or upgrade an embedding model. All systems built using embedding models should be designed with upgradability in mind because newer and better models are released all the time and, most of the time, upgrading a model is the easiest way to improve overall system performance. An embedding model is a less stable component of the system infrastructure in this case.

The second consequence of using an embedding model for user queries is that the inference latency becomes very important when the number of users goes up. Model inference takes more time for better-performing models, especially if they require GPU to run: having latency higher than 100ms for a small query is not unheard of for models that have more than 1B parameters. It turns out that smaller, leaner models are still very important in a higher-load production scenario. 

The tradeoff between quality and latency is real and we should always remember about it when choosing an embedding model.

As we have mentioned above, embedding models help manage output vector dimensionality which affects the performance of many algorithms downstream. Generally the smaller the model, the shorter the output vector length, but, often, it is still too great for smaller models. That’s when we need to use dimensionality reduction algorithms such as PCA (principal component analysis), SNE / tSNE (stochastic neighbor embedding), and UMAP (uniform manifold approximation). 

Another place we can use dimensionality reduction is before storing embeddings in a database. Resulting vector embeddings will occupy less space and retrieval speed will be faster, but will come at a price for the quality downstream. Vector databases are often not the primary storage, so embeddings can be regenerated with better precision from the original source data. Their use helps to reduce the output vector length and, as a result, makes the system faster and leaner.

Making the Right Choice

There’s an abundance of factors and trade-offs that should be considered when choosing an embedding model for a use case. The score of a potential model in common benchmarks is important, but we should not forget that it’s the larger models that have a better score. Larger models have higher inference time which can severely limit their use in low latency scenarios as often an embedding model is a pre-processing step in a larger pipeline. Also, larger models require GPUs to run. 

If you intend to use a model in a low-latency scenario, it’s better to focus on latency first and then see which models with acceptable latency have the best-in-class performance. Also, when building a system with an embedding model you should plan for changes since better models are released all the time and often it’s the simplest way to improve the performance of your system.

Closing the Generative AI Confidence Gap

Discover how DataRobot helps you deliver real-world value with generative AI

Learn more

The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

Researchers develop rapid safety check method that ensures a robot will avoid collisions

Before a robot can grab dishes off a shelf to set the table, it must ensure its gripper and arm won't crash into anything and potentially shatter the fine china. As part of its motion planning process, a robot typically runs "safety check" algorithms that verify its trajectory is collision-free.

An e-skin that can detect tactile information and produce tactile feedback

In recent years, materials scientists and engineers have introduced increasingly sophisticated materials for robotic and prosthetic applications. This includes a wide range of electronic skins, or e-skins, designed to sense the surrounding environment and artificially reproduce the sense of touch.

Advanced noise suppression technology for improved search and rescue drones

Unmanned Aerial Vehicles (UAVs) have received significant attention in recent years across many sectors, such as military, agriculture, construction, and disaster management. These versatile machines offer remote access to hard-to-get or hazardous areas and excellent surveillance capabilities.

Robotic-assisted surgery for gallbladder cancer as effective as traditional surgery

Each year, approximately 2,000 people die annually of gallbladder cancer (GBC) in the U.S., with only one in five cases diagnosed at an early stage. With GBC rated as the first biliary tract cancer and the 17th most deadly cancer worldwide, pressing attention for proper management of disease must be addressed. For patients diagnosed, surgery is the most promising curative treatment. While there has been increasing adoption of minimally invasive surgical techniques in gastrointestinal malignancies, including utilization of laparoscopic and robotic surgery, there are reservations in utilizing minimally invasive surgery for gallbladder cancer. A new study has found that robotic-assisted surgery for GBC is as effective as traditional open and laparoscopic methods, with added benefits in precision and quicker post-operative recovery.

Systems That Create: The Growing Impact of Generative AI

We humans like to think we’re the only beings capable of creativity, but computers have been used as a generative force for decades, creating original pieces of writing, art, music, and design. This digital renaissance, powered by advancements in artificial intelligence and machine learning, has ushered in a new era where technology not only replicates but also innovates, blurring the lines between human and machine creativity. From algorithms that compose symphonies to software that drafts novels, the scope of computer-generated creativity is expanding, challenging our preconceived notions of artistry and originality.

A Brief Look Into the History of Creative AI

Generative Adversarial Networks (GANs) for image generation were introduced in 2014. Then in 2016, DeepMind introduced WaveNet and audio generation. Next year, the Google research team suggested the Transformer architecture for text understanding and generation, and it became the basis for all the large language models we know today.

The research advancements quickly transformed into practical applications. In 2015, engineer and creative storyteller Samim trained a neural network on 14 million lines of passages from romance novels and asked the model to generate original stories based on new images.

neural storryteller
Image from “Generating Stories from Images” (2015) by Samim Winiger

A year later, Flow Machines, a division of Sony, used an AI system trained on Beatles songs to generate their own hit, “Daddy’s Car,” which eerily resembles the musical style of the hit British rock group. They did the same with Bach music and were able to fool human evaluators, who had trouble differentiating between real Bach compositions and AI-generated imitations.

Then, in 2017, Autodesk, the leading producer of computer-aided design (CAD) software for industrial design, released Dreamcatcher, a program that generates thousands of possible design permutations based on initial constraints set by engineers. Dreamcatcher has produced bizarre yet highly effective designs that challenge traditional manufacturing assumptions and exceed what human designers can manually ideate.

Autodesk Dreamcatcher
Image from Autodesk Dreamcatcher, reprinted with permission

If this applied AI content is useful for you, subscribe to our AI mailing list to be alerted when we release new material. 

AI Text Generation

The recent advent of generative AI has sparked a renaissance in computational creativity. OpenAI’s ChatGPT has become probably the most widely-known example of the AI’s text generative power, but it has many strong competitors, including Anthropic’s Claude, Google’s Gemini, Meta’s Llama, and others.

These large language models (LLMs) possess the ability to craft text on virtually any subject, all while reflecting a tailored writing style. For example, imagine we task ChatGPT with writing a piece about artificial intelligence’s worldwide domination through authoring books, crafting images, and generating code – all in the dramatic style of a poetry slam. The resulting creation is quite impressive.

creative AI

While this serves as a playful illustration, the potential applications of LLMs go well beyond simple entertainment:

  • Marketing teams are already tapping into the creative power of ChatGPT and similar models to craft captivating stories, blog posts, social media content, and advertisements that echo a brand’s unique voice.
  • Customer support teams utilize LLM-powered bots to offer round-the-clock assistance to their customers.
  • In software development, new AI-assisted engineering workflows are taking shape, powered by generative AI coding tools. These tools offer code suggestions and complete functions, drawing on natural language prompts and existing codebases.

However, LLM-based applications are full of their pitfalls. Their performance can be erratic, leading to instances of ‘hallucination.’ Several notable incidents have occurred where companies were forced to honor a refund policy fabricated by their chatbot or users were able to trick the chatbot into selling them a car for $1. At this juncture, it’s imperative to consider these risks and, in high-stakes situations, to incorporate human oversight into the process. Yet, it’s clear that this technology is already significantly influencing business processes, with its impact set to increase further.

AI Image Generation

While large language models are revolutionizing the field of text generation, providing novel tools and challenges to writers, diffusion models are making waves in the world of art and design. 

Tools like Midjourney, Stable Diffusion by Stability AI, and DALL-E 3 by OpenAI can generate images so realistic they could be mistaken for actual photographs. 

Midjourney creative AI
Generated with Midjourney v5.2 (July 2023)

Industry titans like Adobe are also stepping up, placing an emphasis on the ethical and legal implications of AI-generated images. To assuage enterprise concerns about using AI-generated images, Adobe has restricted its training dataset to licensed Adobe Stock and public domain images. Moreover, they provide an IP indemnity for content created using select Firefly workflows, their proprietary AI image generator. Others, including Google, Microsoft, and OpenAI followed their example to enhance the transition of enterprise customers to AI-generated content.

Despite significant advancements in AI image generation throughout 2023, the technology still faces notable limitations, akin to those experienced by LLMs. Chief among these challenges is the tendency of AI tools to deviate from the explicit instructions provided in prompts, produce images with occasional artifacts, and exhibit biases in diversity. Typically, AI image generators produce content that mirrors the available online databases, which often consist of images featuring aesthetically appealing, model-like individuals, predominantly white women and men. To achieve a more equitable representation, it is necessary to deliberately introduce diversity into the generated images. However, caution is advised to avoid the pitfalls of overcorrection, as evidenced by the controversy surrounding Google’s Gemini image generation. The tool faced criticism for its extreme bias in refusing to generate images of white individuals, particularly white men, and for producing unconventional representations, like for example, Black popes and female Nazi soldiers.

AI Video Generation

Last year marked the inception of notable advancements in text-to-video generation and editing, with pioneers like Runway leading the charge. They were at the forefront of creating new videos from text prompts and reference materials. However, the videos were limited to approximately four seconds in duration, were still of low quality, and exhibited significant issues with warping and morphing.

The year 2024 was anticipated to be a watershed moment for AI video generation, and it has already begun to fulfill those expectations. OpenAI recently unveiled Sora, its AI video generator which, based on available demonstrations, significantly surpasses the capabilities of alternative tools developed by Runway, Pika Labs, Genmo, Google (Lumiere), Meta (Emu), and ByteDance (MagicVideo-V2). 

While Sora distinguishes itself from its competitors, it remains inaccessible to the public, and the full scope of its capabilities has yet to be thoroughly evaluated beyond the sphere of meticulously crafted demonstrations.

Nonetheless, the technology’s capacity to transform various sectors, such as entertainment, filmmaking, and marketing, is immense. The full extent of how AI-generated videos will be utilized in business and their primary challenges remain to be seen. However, even now, there’s a growing concern over the proliferation of deepfake videos online, as it becomes increasingly straightforward to produce convincing videos depicting events that never occurred.

The Boundless Horizon of AI Creativity

AI systems that create have taken center stage in recent years, expanding their influence across a multitude of sectors, from art, design, music, and entertainment to software development, education, and drug development. As these systems grow more sophisticated, they promise to redefine what’s possible, opening up new avenues for innovation and creativity. The fusion of artificial intelligence with human ingenuity has the potential to accelerate breakthroughs, solve complex problems, and craft experiences that were once unimaginable. As we stand on the brink of this new frontier, it is crucial to navigate the ethical implications and ensure that these technologies are used responsibly and for the greater good.

Enjoy this article? Sign up for more AI updates.

We’ll let you know when we release more summary articles like this one.

The post Systems That Create: The Growing Impact of Generative AI appeared first on TOPBOTS.

A key to the future of robots could be hiding in liquid crystals

Robots and cameras of the future could be made of liquid crystals, thanks to a new discovery that significantly expands the potential of the chemicals already common in computer displays and digital watches. The findings are a simple and inexpensive way to manipulate the molecular properties of liquid crystals with light exposure.
Page 77 of 88
1 75 76 77 78 79 88