Large language model performance matrix

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Note

Excellent is the best rating, followed by Great, then by Good, and finally by Poor.

Proprietary models ¶

Models from third-party LLM providers.

Feature Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery
Model Claude 3: Opus Excellent Excellent Excellent Good Great
Claude 3.5: Sonnet v2 Excellent Excellent Excellent Excellent Great
Claude 3.5: Sonnet Excellent Excellent Excellent Excellent Excellent
Claude 3.5: Haiku Excellent Excellent Excellent Excellent Poor
Claude 3: Haiku Excellent Excellent Excellent Excellent Poor
GPT-4o Excellent Excellent Excellent Excellent Great
GPT-4o-mini Excellent Great Great Great Poor
Gemini 1.5 Pro 002 Excellent Excellent Excellent Excellent Excellent
Gemini 1.5 Flash 002 Excellent Poor Good Excellent Poor

Open-source models ¶

Models you can deploy yourself.

Feature Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery
Model Mistral Nemo Good Good Great Good Poor
LLama 3.2 Good Poor Good Poor Poor
LLama 3.1 405b Good Great Good Good Poor
LLama 3.1 70b Good Good Poor Poor Poor