LLM2Vec Large Language Models

Conversations On

AI

App Development

CRM

Enterprise IT

Ethics & Governance

Futures

HR

Industries

ServiceNow on ServiceNow

Platform Foundations

Products & Solutions

All topics

For Leaders In

IT & Dev

Customer Experience

Finance, Operations & Strategy

Employee Experience

Security & Risk

News & Events

People & Culture

My List

Explore All

April 9, 2024

4 min

LLM2Vec: Large language models are secretly powerful text encoders

Launch

Email

Parishad BehnamGhader

Research Scientist, ServiceNow

Vaibhav Adlakha

Visiting Researcher, ServiceNow

Marius Mosbach

Research Scientist, ServiceNow

Dzmitry Bahdanau

Research Lead, ServiceNow

The Now Platform Xanadu release, powered by Now Assist

Nicolas Chapados and Siva Reddy also contributed to this content.

Text-embedding models convert a piece of text, such as a search query, document, or piece of code, into a sequence of real-valued numbers. Given such embeddings, we can measure the similarity, or relatedness, of pieces of text. This facilitates various important applications, such as search, clustering, retrieval, and classification.

With the widespread availability of decoder-only large language models (LLMs), such as GPT-4, LLaMA2, Mistral-7B, and StarCoder2, a pressing question in the natural language processing (NLP) research community is how best to use these models to construct powerful text embeddings.

We’re excited to present LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, a simple and efficient solution to transform any decoder-only LLM into a powerful text encoder in an unsupervised fashion simply by using adapters (LoRA), without the need to modify the base models.

Below we give an overview of the key components of LLM2Vec and present the exciting results we got when benchmarking LLM2Vec models on the challenging Massive Text Embeddings Benchmark (MTEB).

Our LLM2Vec-Mistral ranks first on the MTEB leaderboard in the unsupervised category, first in the supervised category among the models trained on publicly available embedding data (E5), and seventh on the overall leaderboard (the other top six models are trained on synthetic data generated from GPT-4/similar-scale models).

LLM2Vec: “LLMs are secretly powerful text encoders” - Mila, McGill, ServiceNow

LLM2Vec enabling bidirectional attention, masked next token prediction, and unsupervised contrastive learning

A simple and efficient recipe

At its core, LLM2Vec consists of three simple steps:

Enabling bidirectional attention
Adaptation via masked next-token prediction (MNTP)
Adaptation via unsupervised contrastive learning

Adapting a model with the LLM2Vec approach is highly efficient and works with parameter-efficient fine-tuning methods such as LoRA. Additionally, the adaptation can be performed using a general domain corpus such as Wikipedia, requires only a few hundred training steps, and can be run on a single GPU.

State-of-the-art performance

LLM2Vec is not only simple and efficient, but it also leads to state-of-the-art performance on the challenging MTEB, both in the unsupervised and supervised setting (among models trained only on publicly available data).

Unsupervised results

We applied LLM2Vec to some of the best-performing LLMs available and evaluated the resulting text—embedding models on MTEB. In the unsupervised setting—i.e., without using any labeled training data for contrastive learning—our LLM2Vec-transformed models achieved a new state-of-the-art performance of 56.80, outperforming the previous unsupervised approach by a large margin.

Supervised results

LLM2Vec can also be easily combined with supervised contrastive learning. As our results show, applying LLM2Vec before supervised contrastive learning leads to a substantial improvement.

Moreover, LLM2Vec in combination with Mistral-7B, currently the best-performing 7 billion-parameter LLM, leads to a new state-of-the-art performance of 64.80 on MTEB among models trained only with publicly available data.

Highly sample-efficient

LLM2Vec-transformed models require less training data to perform well compared to training models without the LLM2vec transformation.

These results make us particularly excited about challenging real-world scenarios where large amounts of labeled data might be costly to acquire.

Use it on your own data

We’ve made it easy for you to use our LLM2Vec-transformed models. LLM2Vec class is a wrapper on top of Hugging Face models to support sequence encoding and pooling operations. The steps below showcase an example of how to use the library.

Diagrams showing the amount of data needed to train Sheared-LLaMA-1.3B, Llama-2-7b-chat-hf, and Mistral-7B-Instruct-v0.2

Code to initialize the model and apply MNTP-trained LoRA weights on top

Preparing the model

Here, we first initialize the model and apply MNTP-trained LoRA weights on top. After merging the model with MNTP weights, we can either:

Load the unsupervised-trained LoRA weights (trained with SimCSE objective and wiki corpus)
Load the model with supervised-trained LoRA weights (trained with contrastive learning and public E5 data)

Applying LLM2Vec wrapper

Then, we define our LLM2Vec encoder model as follows:

from llm2vec import LLM2Vec

l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512)

Inference

This model now returns the text embedding for any input in the form of [[instruction1, text1], [instruction2, text2]] or [text1, text2]. While training, we provide instructions for both sentences in symmetric tasks and only for queries in asymmetric tasks.

Code showing the text returned for any input, for both sentences in symmetric tasks and queries in asymmetric tasks

Summary

As demonstrated above, LLM2Vec is a simple unsupervised approach that can transform any pretrained decoder-only LLM into a strong text encoder.

If you’re as excited about LLM2Vec as we are, check out our hands-on tutorial, which walks you through the different steps of our method. We also welcome contributions on Github and invite the community to share their LLM2Vec-transformed models.

Research: Project page

Code: LLM2Vec on GitHub

Tutorial: Learn how to apply LLM2Vec to LLaMA-2

Find out more about ServiceNow AI Research.

Next up

Dive into more conversations

AI

App Development

CRM

Enterprise IT

Ethics & Governance

Human Resources

Industries

ServiceNow on ServiceNow

Platform Foundations

Products & Solutions

All Topics

Stay in the know

Join Us

Your work email puts us to work

Automotive

Banking

Consumer Packaged Goods

Healthcare

Insurance

Life Sciences

Manufacturing

Nonprofit

National Government

Retail

Technology Providers

Telecom

Find a partner

Become a partner

Partner awards

Partner portal

Partner applications

Careers

Investors

ServiceNow AI Research

Leadership

Locations

Newsroom

Analyst Reports

Global impact

Trust and compliance

AI Agents

IT Service Management

ServiceNow AI Control Tower

IT Operations Management

Customer Service Management

Strategic Portfolio Management

IT Asset Management

Governance, Risk, and Compliance

Security Operations

Field Service Management

HR Service Delivery

EmployeeWorks

AI

Data

Workflows

AI Experience

RaptorDB

Infrastructure

AI Agents

ServiceNow AI Control Tower

Security

App Engine

ServiceNow Store

Responsible AI

Provide better experiences

Resolve issues faster

Create and automate workflows

Enterprise Architecture

Service Operations Workspace

Cloud Governance Suite

Operational Technology Management

IT Asset Management

IT Operations Management

IT Service Management

ServiceNow Cloud Observability

Strategic Portfolio Management

Digital End-user Experience

Customer Service Management

Field Service Management

Sales and Order Management

Configure, Price, Quote

Financial Services Operations

Healthcare and Life Sciences Service Management

Sales and Order Management for Technology Providers

Sales and Order Management for Telecommunications

Public Sector Digital Services

Telecommunications Service Management

Technology Provider Service Management

Security Operations

Security Incident Response

Vulnerability Response

Threat Intelligence Security Center

Integrated Risk Management

Third-party Risk Management

Security Posture Control

Privacy Management

HR Service Delivery

Talent Development

Legal Service Delivery

Workplace Service Delivery

Accounts Payable Operations

Sourcing and Procurement Operations

Supplier Lifecycle Operations