Large language model performance matrix

Elastic Stack Serverless Security

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Important

Excellent is the best rating, followed by Great, then by Good, and finally by Poor. Models rated Excellent or Great should produce quality results. Models rated Good or Poor are not recommended for that use case.

Proprietary models

Models from third-party LLM providers.

Feature	-	Assistant - General	Assistant - ES\|QL generation	Assistant - Alert questions	Assistant - Knowledge retrieval	Attack Discovery	AI-powered SIEM migration
Model	Claude 3: Opus	Excellent	Excellent	Excellent	Good	Great	Good
	Claude 3.7: Sonnet	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	Claude 3.5: Sonnet v2	Excellent	Excellent	Excellent	Excellent	Great	Excellent
	Claude 3.5: Sonnet	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	Claude 3.5: Haiku	Excellent	Excellent	Excellent	Excellent	Poor	Poor
	Claude 3: Haiku	Excellent	Excellent	Excellent	Excellent	Poor	Poor
	GPT-4o	Excellent	Excellent	Excellent	Excellent	Great	Great
	GPT-4o-mini	Excellent	Great	Great	Great	Poor	Good
	Gemini 1.5 Pro 002	Excellent	Excellent	Excellent	Excellent	Excellent	Great
	Gemini 1.5 Flash 002	Excellent	Poor	Good	Excellent	Poor	Excellent
	Gemini 2.0 Flash 001	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent

Open-source models

Models you can deploy yourself.

Feature		Assistant - General	Assistant - ES\|QL generation	Assistant - Alert questions	Assistant - Knowledge retrieval	Attack Discovery
Model	Mistral Nemo	Good	Good	Great	Good	Poor
	LLama 3.2	Good	Poor	Good	Poor	Poor
	LLama 3.1 405b	Good	Great	Good	Good	Poor
	LLama 3.1 70b	Good	Good	Poor	Poor	Poor