Chat（Llama-3.1-nemotron）

POST

/chat/completions

Nvidia's fine-tuned model, built on Llama-3.1, ranks just behind o1 in performance scores.

Price list：https://302.ai/pricing_api/

Request

Header Params

Content-Type

string

required

Example:

application/json

string

required

Example:

application/json

Authorization

string

optional

Example:

Bearer {{YOUR_API_KEY}}

Body Params application/json

model

string

required

The ID of the model to be used. For detailed information on which models are applicable to the chat API, please view Model endpoint compatibility

messages

array [object {2}]

required

Generate messages in chat format for chat completions.

role

string

optional

content

string

optional

temperature

integer

optional

What sampling temperature to use, ranging from 0 to 2. Higher values, such as 0.8, will make the output more random, while lower values, like 0.2, will make it more focused and deterministic. We generally recommend adjusting either this or top_p, but not both simultaneously.

top_p

integer

optional

An alternative to temperature sampling is nucleus sampling, where the model considers tokens within the top_p probability mass. For instance, top_p = 0.1 means only tokens within the top 10% probability mass are considered. We recommend adjusting either this or temperature, but not both simultaneously.

integer

optional

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

stream

boolean

optional

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by data: [DONE] message. For Example Code.，Please view OpenAI Cookbook.

stop

string

optional

Up to 4 sequences where the API will stop generating further tokens.

max_tokens

integer

optional

The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.

presence_penalty

number

optional

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. See more information about frequency and presence penalties.

frequency_penalty

number

optional

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. See more information about frequency and presence penalties.

logit_bias

null

optional

Modify the likelihood of specific tokens appearing in the completion. This can be done by providing a JSON object that maps tokens (identified by their token IDs in the tokenizer) to a bias value ranging from -100 to 100. Mathematically, this bias is added to the logits generated by the model before sampling. The exact effect varies depending on the model, but values between -1 and 1 will slightly decrease or increase the likelihood of selection, while values like -100 or 100 will either ban or exclusively select the relevant token.

user

string

optional

A unique identifier representing your end users, which helps OpenAI monitor and detect abuse. View more

Example

{
  "model": "llama-3.1-nemotron",
  "messages": [{"role": "user", "content": "Hello!"}]
}

Request samples

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

curl --location --request POST 'https://api.302.ai/v1/chat/completions' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer sk-jls4AaVBGoe1GwZD64qZA1qyKTN1MPHa4NmvH1cT68z7K1Zz' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "llama-3.1-nemotron",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

Responses

🟢200OK

application/json

Body

string

required

object

string

required

created

integer

required

choices

array [object {3}]

required

index

integer

optional

message

object

optional

finish_reason

string

optional

usage

object

required

prompt_tokens

integer

required

completion_tokens

integer

required

total_tokens

integer

required

Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nHello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

Modified at 2025-03-19 05:52:14

Chat（Qwen3）

Chat（QwQ-32B、QwQ-Plus、QwQ-32B-Preview）