Chroma
This notebook covers how to get started with the Chroma
vector store.
Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0. View the full docs of
Chroma
at this page, and find the API reference for the LangChain integration at this page.
Chroma Cloud powers serverless vector and full-text search. It's extremely fast, cost-effective, scalable and painless. Create a DB and try it out in under 30 seconds with $5 of free credits.
Setup
To access Chroma
vector stores you'll need to install the langchain-chroma
integration package.
pip install -qU "langchain-chroma>=0.1.2"
Credentials
You can use the Chroma
vector store without any credentials, simply installing the package above is enough!
If you are a Chroma Cloud user, set your CHROMA_TENANT
, CHROMA_DATABASE
, and CHROMA_API_KEY
environment variables.
When you install the chromadb
package you also get access to the Chroma CLI, which can set these for you. First, login via the CLI, and then use the connect
command:
chroma db connect [db_name] --env-file
If you want to get best in-class automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Initialization
Basic Initialization
Below is a basic initialization, including the use of a directory to save the data locally.
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
Running Locally (In-Memory)
You can get a Chroma server running in memory by simply instantiating a Chroma
instance with a collection name and your embeddings provider:
from langchain_chroma import Chroma
vector_store = Chroma(
collection_name="example_collection",
embedding_function=embeddings,
)
If you don't need data persistence, this is a great option for experimenting while building your AI application with Langchain.