adds image recognition capabilities to all models, and there are two ways to enable it, you can choose either one:
1.
Append -ocr to all model suffixes, for example gpt-3.5-turbo-ocr (convenient for third-party software)
2.
Specify the OCR model when requesting the model: ocr_model:gpt-4o-mini, as shown in the example (convenient for API)
Note: If the suffix of the multimodal model has -ocr, it will also use the specified or default OCR model for image analysis, so try to avoid enabling this feature on multimodal modelsThe principle of this function is: before each request, the user's image is sent to the multimodal model for analysis, and then the analysis results are incorporated into the model context as reference information. The specific process can be viewed in the logs during the API call. The default OCR model currently in use is gpt-4o-mini.Image analysis prompt words:
Price: On the basis of the original model + the cost of the multimodal model
Request
Header Params
Content-Type
string
required
Example:
application/json
Accept
string
required
Example:
application/json
Authorization
string
required
Fill the API KEY generated in the management backend - API KEYS after Bearer, for example Bearer sk-xxxx
What sampling temperature is used, between 0 and 2. A higher value (such as 0.8) will make the output more random, while a lower value (such as 0.2) will make the output more centralized and definite. We usually recommend changing this or top_p but not both.
top_p
integer
optional
An alternative to temperature sampling, called kernel sampling, in which the model considers the results of markers with top_p probability mass. So 0.1 means only considering marks that constitute the top 10% probability mass. We usually recommend changing this or temperature but not both.
n
integer
optional
How many chat completion options are generated for each input message.
stream
boolean
optional
If set, partial message increments will be sent, just like in ChatGPT. When tokens are available, tokens will be sent as raw data data: [DONE] through Server-Sent Events, and the stream is terminated by the message. For example code, please see OpenAI Cookbook.
stop
string
optional
API will stop generating a maximum of 4 tokens for more sequences.
max_tokens
integer
optional
The maximum number of tokens generated when the chat is complete. The total length of input tokens and generated tokens is limited by the model's context length.
presence_penalty
number
optional
Numbers between -2.0 and 2.0. Positive values punish new tags based on whether they appear in the text so far, thereby increasing the possibility that the model will talk about new topics. See more information about frequency and exists penalties.
frequency_penalty
number
optional
Numbers between -2.0 and 2.0. Positive values punish new tags based on their existing frequency in the text, reducing the possibility that the model will repeat the same line verbatim. See more information about frequency and exists penalties.
logit_bias
null
optional
Modify the possibility that the specified mark appears in completion. Accepts a json object that maps the tag (specified by the tag ID in the tagger) to the associated bias value from -100 to 100. Mathematically, bias is added to the model-generated logits before sampling. The exact effect varies by model, but values between -1 and 1 should reduce or increase the likelihood of selection; values like -100 or 100 should result in prohibited or exclusive selection of the relevant token.
user
string
optional
A unique identifier that represents your end user, which can help OpenAI monitor and detect abuse. Learn more.
{"id":"chatcmpl-123","object":"chat.completion","created":1677652288,"choices":[{"index":0,"message":{"role":"assistant","content":"\n\nHello there, how may I assist you today?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":9,"completion_tokens":12,"total_tokens":21}}