Loading

Large language model performance matrix

Elastic Stack Serverless Security

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Important

Excellent is the best rating, followed by Great, then by Good, and finally by Poor. Models rated Excellent or Great should produce quality results. Models rated Good or Poor are not recommended for that use case.

Models from third-party LLM providers.

Feature - Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery AI-powered SIEM migration
Model Claude 3: Opus Excellent Excellent Excellent Good Great Good
Claude 3.7: Sonnet Excellent Excellent Excellent Excellent Excellent Excellent
Claude 3.5: Sonnet v2 Excellent Excellent Excellent Excellent Great Excellent
Claude 3.5: Sonnet Excellent Excellent Excellent Excellent Excellent Excellent
Claude 3.5: Haiku Excellent Excellent Excellent Excellent Poor Poor
Claude 3: Haiku Excellent Excellent Excellent Excellent Poor Poor
GPT-4o Excellent Excellent Excellent Excellent Great Great
GPT-4o-mini Excellent Great Great Great Poor Good
Gemini 1.5 Pro 002 Excellent Excellent Excellent Excellent Excellent Great
Gemini 1.5 Flash 002 Excellent Poor Good Excellent Poor Excellent
Gemini 2.0 Flash 001 Excellent Excellent Excellent Excellent Excellent Excellent

Models you can deploy yourself.

Feature Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery
Model Mistral Nemo Good Good Great Good Poor
LLama 3.2 Good Poor Good Poor Poor
LLama 3.1 405b Good Great Good Good Poor
LLama 3.1 70b Good Good Poor Poor Poor