adds image recognition capabilities to all models, and there are two ways to enable it, you can choose either one:
1.
Append -ocr to all model suffixes, for example gpt-3.5-turbo-ocr (convenient for third-party software)
2.
Specify the OCR model when requesting the model: ocr_model:gpt-4o-mini, as shown in the example (convenient for API)
Note: If the suffix of the multimodal model has -ocr, it will also use the specified or default OCR model for image analysis, so try to avoid enabling this feature on multimodal modelsThe principle of this function is: before each request, the user's image is sent to the multimodal model for analysis, and then the analysis results are incorporated into the model context as reference information. The specific process can be viewed in the logs during the API call. The default OCR model currently in use is gpt-4o-mini.Image analysis prompt words:
Price: On the basis of the original model + the cost of the multimodal model
{"id":"chatcmpl-123","object":"chat.completion","created":1677652288,"choices":[{"index":0,"message":{"role":"assistant","content":"\n\nHello there, how may I assist you today?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":9,"completion_tokens":12,"total_tokens":21}}