Skip to main content

Cohere

API KEYSโ€‹

import os 
os.environ["COHERE_API_KEY"] = ""

Usageโ€‹

LiteLLM Python SDKโ€‹

Cohere v1 API (Default)โ€‹

from litellm import completion

## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"

# cohere v1 call
response = completion(
model="command-a-03-2025",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Cohere v2 APIโ€‹

To use the Cohere v2/chat API, prefix your model name with cohere_chat/v2/:

from litellm import completion

## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"

# cohere v2 call
response = completion(
model="cohere_chat/v2/command-r-plus",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Streamingโ€‹

Cohere v1 Streaming:

from litellm import completion

## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"

# cohere v1 streaming
response = completion(
model="command-a-03-2025",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True
)

for chunk in response:
print(chunk)

Cohere v2 Streaming:

from litellm import completion

## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"

# cohere v2 streaming
response = completion(
model="cohere_chat/v2/command-r-plus",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True
)

for chunk in response:
print(chunk)

Usage with LiteLLM Proxyโ€‹

Here's how to call Cohere with the LiteLLM Proxy Server

1. Save key in your environmentโ€‹

export COHERE_API_KEY="your-api-key"

2. Start the proxyโ€‹

Define the cohere models you want to use in the config.yaml

For Cohere v1 models:

model_list:
- model_name: command-a-03-2025
litellm_params:
model: command-a-03-2025
api_key: "os.environ/COHERE_API_KEY"

For Cohere v2 models:

model_list:
- model_name: command-a-03-2025-v2
litellm_params:
model: cohere_chat/v2/command-a-03-2025
api_key: "os.environ/COHERE_API_KEY"
litellm --config /path/to/config.yaml

3. Test itโ€‹

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <your-litellm-api-key>' \
--data ' {
"model": "command-a-03-2025",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'

Supported Modelsโ€‹

Model NameFunction Call
command-a-03-2025litellm.completion('command-a-03-2025', messages)
command-r-plus-08-2024litellm.completion('command-r-plus-08-2024', messages)
command-r-08-2024litellm.completion('command-r-08-2024', messages)
command-r-pluslitellm.completion('command-r-plus', messages)
command-rlitellm.completion('command-r', messages)
command-lightlitellm.completion('command-light', messages)
command-nightlylitellm.completion('command-nightly', messages)

Embeddingโ€‹

from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"

# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
)

Setting - Input Type for v3 modelsโ€‹

v3 Models have a required parameter: input_type. LiteLLM defaults to search_document. It can be one of the following four values:

  • input_type="search_document": (default) Use this for texts (documents) you want to store in your vector database
  • input_type="search_query": Use this for search queries to find the most relevant documents in your vector database
  • input_type="classification": Use this if you use the embeddings as an input for a classification system
  • input_type="clustering": Use this if you use the embeddings for text clustering

https://txt.cohere.com/introducing-embed-v3/

from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"

# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
input_type="search_document"
)

Supported Embedding Modelsโ€‹

Model NameFunction Call
embed-english-v3.0embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"])
embed-english-light-v3.0embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"])
embed-multilingual-v3.0embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"])
embed-multilingual-light-v3.0embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"])
embed-english-v2.0embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"])
embed-english-light-v2.0embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"])
embed-multilingual-v2.0embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"])

Rerankโ€‹

Usageโ€‹

LiteLLM supports the v1 and v2 clients for Cohere rerank. By default, the rerank endpoint uses the v2 client, but you can specify the v1 client by explicitly calling v1/rerank

from litellm import rerank
import os

os.environ["COHERE_API_KEY"] = "sk-.."

query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
]

response = rerank(
model="cohere/rerank-english-v3.0",
query=query,
documents=documents,
top_n=3,
)
print(response)