Machine Learning
Eland allows transforming trained models from scikit-learn, XGBoost, and LightGBM libraries to be serialized and used as an inference model in Elasticsearch.
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
)
# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
You need to install the appropriate version of PyTorch to import an NLP model. Run python -m pip install 'eland[pytorch]'
to install that version.
For NLP tasks, Eland enables you to import PyTorch models into Elasticsearch. Use the eland_import_hub_model
script to download and install supported transformer models from the Hugging Face model hub. For example:
$ eland_import_hub_model <authentication> \ 1
--url http://localhost:9200/ \ 2
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \ 3
--task-type ner \ 4
--start
- Use an authentication method to access your cluster. Refer to Authentication methods.
- The cluster URL. Alternatively, use
--cloud-id
. - Specify the identifier for the model in the Hugging Face model hub.
- Specify the type of NLP task. Supported values are
fill_mask
,ner
,question_answering
,text_classification
,text_embedding
,text_expansion
,text_similarity
andzero_shot_classification
.
For more information about the available options, run eland_import_hub_model
with the --help
option.
$ eland_import_hub_model --help
To use the Docker container, you need to clone the Eland repository: https://github.com/elastic/eland
If you want to use Eland without installing it, you can use the Docker image:
You can use the container interactively:
$ docker run -it --rm --network host docker.elastic.co/eland/eland
Running installed scripts is also possible without an interactive shell, for example:
docker run -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url $ELASTICSEARCH_URL \
--hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \
--start
Replace the $ELASTICSEARCH_URL
with the URL for your Elasticsearch cluster. For authentication purposes, include an administrator username and password in the URL in the following format: https://username:password@host:port
.
You can install models in a restricted or closed network by pointing the eland_import_hub_model
script to local files.
For an offline install of a Hugging Face model, the model first needs to be cloned locally, Git and Git Large File Storage are required to be installed in your system.
Select a model you want to use from Hugging Face. Refer to the compatible third party model list for more information on the supported architectures.
Clone the selected model from Hugging Face by using the model URL. For example:
git clone https://huggingface.co/dslim/bert-base-NER
This command results in a local copy of of the model in the directory
bert-base-NER
.Use the
eland_import_hub_model
script with the--hub-model-id
set to the directory of the cloned model to install it:eland_import_hub_model \ --url 'XXXX' \ --hub-model-id /PATH/TO/MODEL \ --task-type ner \ --es-username elastic --es-password XXX \ --es-model-id bert-base-ner
If you use the Docker image to run
eland_import_hub_model
you must bind mount the model directory, so the container can read the files:docker run --mount type=bind,source=/PATH/TO/MODEL,destination=/model,readonly -it --rm docker.elastic.co/eland/eland \ eland_import_hub_model \ --url 'XXXX' \ --hub-model-id /model \ --task-type ner \ --es-username elastic --es-password XXX \ --es-model-id bert-base-ner
Once it’s uploaded to Elasticsearch, the model will have the ID specified by
--es-model-id
. If it is not set, the model ID is derived from--hub-model-id
; spaces and path delimiters are converted to double underscores__
.
Behind the scenes, Eland uses the requests
Python library, which allows configuring proxies through an environment variable. For example, to use an HTTP proxy to connect to an HTTPS Elasticsearch cluster, you need to set the HTTPS_PROXY
environment variable when invoking Eland:
HTTPS_PROXY=http://proxy-host:proxy-port eland_import_hub_model ...
If you disabled security on your Elasticsearch cluster, you should use HTTP_PROXY
instead.
The following authentication options are available when using the import script:
Elasticsearch username and password authentication (specified with the
-u
and-p
options):eland_import_hub_model -u <username> -p <password> --cloud-id <cloud-id> ...
These
-u
and-p
options also work when you use--url
.Elasticsearch username and password authentication (embedded in the URL):
eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...
Elasticsearch API key authentication:
eland_import_hub_model --es-api-key <api-key> --url https://<hostname>:<port> ...
HuggingFace Hub access token (for private models):
eland_import_hub_model --hub-access-token <access-token> ...
The following TLS/SSL options for Elasticsearch are available when using the import script:
Specify alternate CA bundle to verify the cluster certificate:
eland_import_hub_model --ca-certs CA_CERTS ...
Disable TLS/SSL verification altogether (strongly discouraged):
eland_import_hub_model --insecure ...