AlibabaCloud AI Search inference integration
New API reference
For the most up-to-date API details, refer to Inference APIs.
Creates an inference endpoint to perform an inference task with the alibabacloud-ai-search
service.
Request ¶
PUT /_inference/<task_type>/<inference_id>
Path parameters ¶
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion
, -
rerank
-
sparse_embedding
, -
text_embedding
.
-
Request body ¶
chunking_settings
-
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunk_size
- (Optional, integer) Specifies the maximum size of a chunk in words. Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy). -
overlap
- (Optional, integer) Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunk_size
. -
sentence_overlap
- (Optional, integer) Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
. -
strategy
- (Optional, string) Specifies the chunking strategy. It could be either
sentence
orword
.
-
service
- (Required, string) The type of service supported for the specified task type. In this case,
alibabacloud-ai-search
. service_settings
-
(Required, object) Settings used to install the inference model.
These settings are specific to the
alibabacloud-ai-search
service.-
api_key
- (Required, string) A valid API key for the AlibabaCloud AI Search API.
-
service_id
- (Required, string) The name of the model service to use for the inference task.
Available service_ids for the
completion
task:*
ops-qwen-turbo
*qwen-turbo
*qwen-plus
*qwen-max
÷qwen-max-longcontext
For the supported
completion
service_ids, refer to the documentation.Available service_id for the
rerank
task is:*
ops-bge-reranker-larger
For the supported
rerank
service_id, refer to the documentation.Available service_id for the
sparse_embedding
task:*
ops-text-sparse-embedding-001
For the supported
sparse_embedding
service_id, refer to the documentation.Available service_ids for the
text_embedding
task:*
ops-text-embedding-001
*ops-text-embedding-zh-001
*ops-text-embedding-en-001
*ops-text-embedding-002
For the supported
text_embedding
service_ids, refer to the documentation.-
host
- (Required, string) The name of the host address used for the inference task. You can find the host address at the API keys section of the documentation.
-
workspace
- (Required, string) The name of the workspace used for the inference task.
-
rate_limit
- (Optional, object) By default, the
alibabacloud-ai-search
service sets the number of requests allowed per minute to1000
. This helps to minimize the number of rate limit errors returned from AlibabaCloud AI Search. To modify this, set therequests_per_minute
setting of this object in your service settings:
text "rate_limit": { "requests_per_minute": <<number_of_requests>> }
-
task_settings
-
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>
you specified.`task_settings` for the `text_embedding` task type
-
input_type
- (Optional, string) Specifies the type of input passed to the model. Valid values are:
*
ingest
: for storing document embeddings in a vector database.
*search
: for storing embeddings of search queries run against a vector database to find relevant documents.`task_settings` for the `sparse_embedding` task type
-
input_type
- (Optional, string) Specifies the type of input passed to the model. Valid values are:
*
ingest
: for storing document embeddings in a vector database.
*search
: for storing embeddings of search queries run against a vector database to find relevant documents.-
return_token
- (Optional, boolean) If
true
, the token name will be returned in the response. Defaults tofalse
which means only the token ID will be returned in the response.
-
AlibabaCloud AI Search service examples ¶
The following example shows how to create an inference endpoint called alibabacloud_ai_search_completion
to perform a completion
task type.
PUT _inference/completion/alibabacloud_ai_search_completion
{
"service": "alibabacloud-ai-search",
"service_settings": {
"host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"api_key": "{{API_KEY}}",
"service_id": "ops-qwen-turbo",
"workspace" : "default"
}
}
The next example shows how to create an inference endpoint called alibabacloud_ai_search_rerank
to perform a rerank
task type.
PUT _inference/rerank/alibabacloud_ai_search_rerank
{
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-bge-reranker-larger",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
}
The following example shows how to create an inference endpoint called alibabacloud_ai_search_sparse
to perform a sparse_embedding
task type.
PUT _inference/sparse_embedding/alibabacloud_ai_search_sparse
{
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-text-sparse-embedding-001",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
}
The following example shows how to create an inference endpoint called alibabacloud_ai_search_embeddings
to perform a text_embedding
task type.
PUT _inference/text_embedding/alibabacloud_ai_search_embeddings
{
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-text-embedding-001",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
}