Google AI Studio inference integration
New API reference
For the most up-to-date API details, refer to Inference APIs.
Creates an inference endpoint to perform an inference task with the googleaistudio
service.
Request ¶
PUT /_inference/<task_type>/<inference_id>
Path parameters ¶
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion
, -
text_embedding
.
-
Request body ¶
chunking_settings
-
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunk_size
- (Optional, integer) Specifies the maximum size of a chunk in words. Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy). -
overlap
- (Optional, integer) Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunk_size
. -
sentence_overlap
- (Optional, integer) Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
. -
strategy
- (Optional, string) Specifies the chunking strategy. It could be either
sentence
orword
.
-
service
- (Required, string) The type of service supported for the specified task type. In this case,
googleaistudio
. service_settings
-
(Required, object) Settings used to install the inference model.
These settings are specific to the
googleaistudio
service.-
api_key
- (Required, string) A valid API key for the Google Gemini API.
-
model_id
- (Required, string) The name of the model to use for the inference task. You can find the supported models at Gemini API models.
-
rate_limit
- (Optional, object) By default, the
googleaistudio
service sets the number of requests allowed per minute to360
. This helps to minimize the number of rate limit errors returned from Google AI Studio. To modify this, set therequests_per_minute
setting of this object in your service settings:
text "rate_limit": { "requests_per_minute": <<number_of_requests>> }
-
Google AI Studio service example ¶
The following example shows how to create an inference endpoint called google_ai_studio_completion
to perform a completion
task type.
PUT _inference/completion/google_ai_studio_completion
{
"service": "googleaistudio",
"service_settings": {
"api_key": "<api_key>",
"model_id": "<model_id>"
}
}