Large language model performance matrix
This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.
Note
Excellent
is the best rating, followed by Great
, then by Good
, and finally by Poor
.
Proprietary models ¶
Models from third-party LLM providers.
Feature | Assistant - General | Assistant - ES|QL generation | Assistant - Alert questions | Assistant - Knowledge retrieval | Attack Discovery | |
---|---|---|---|---|---|---|
Model | Claude 3: Opus | Excellent | Excellent | Excellent | Good | Great |
Claude 3.5: Sonnet v2 | Excellent | Excellent | Excellent | Excellent | Great | |
Claude 3.5: Sonnet | Excellent | Excellent | Excellent | Excellent | Excellent | |
Claude 3.5: Haiku | Excellent | Excellent | Excellent | Excellent | Poor | |
Claude 3: Haiku | Excellent | Excellent | Excellent | Excellent | Poor | |
GPT-4o | Excellent | Excellent | Excellent | Excellent | Great | |
GPT-4o-mini | Excellent | Great | Great | Great | Poor | |
Gemini 1.5 Pro 002 | Excellent | Excellent | Excellent | Excellent | Excellent | |
Gemini 1.5 Flash 002 | Excellent | Poor | Good | Excellent | Poor |
Open-source models ¶
Models you can deploy yourself.
Feature | Assistant - General | Assistant - ES|QL generation | Assistant - Alert questions | Assistant - Knowledge retrieval | Attack Discovery | |
---|---|---|---|---|---|---|
Model | Mistral Nemo | Good | Good | Great | Good | Poor |
LLama 3.2 | Good | Poor | Good | Poor | Poor | |
LLama 3.1 405b | Good | Great | Good | Good | Poor | |
LLama 3.1 70b | Good | Good | Poor | Poor | Poor |