Large language model performance matrix

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Note

Excellent is the best rating, followed by Great, then by Good, and finally by Poor.

Proprietary models ¶

Models from third-party LLM providers.

Feature		Assistant - General	Assistant - ES\|QL generation	Assistant - Alert questions	Assistant - Knowledge retrieval	Attack Discovery
Model	Claude 3: Opus	Excellent	Excellent	Excellent	Good	Great
	Claude 3.5: Sonnet v2	Excellent	Excellent	Excellent	Excellent	Great
	Claude 3.5: Sonnet	Excellent	Excellent	Excellent	Excellent	Excellent
	Claude 3.5: Haiku	Excellent	Excellent	Excellent	Excellent	Poor
	Claude 3: Haiku	Excellent	Excellent	Excellent	Excellent	Poor
	GPT-4o	Excellent	Excellent	Excellent	Excellent	Great
	GPT-4o-mini	Excellent	Great	Great	Great	Poor
	Gemini 1.5 Pro 002	Excellent	Excellent	Excellent	Excellent	Excellent
	Gemini 1.5 Flash 002	Excellent	Poor	Good	Excellent	Poor

Open-source models ¶

Models you can deploy yourself.

Feature		Assistant - General	Assistant - ES\|QL generation	Assistant - Alert questions	Assistant - Knowledge retrieval	Attack Discovery
Model	Mistral Nemo	Good	Good	Great	Good	Poor
	LLama 3.2	Good	Poor	Good	Poor	Poor
	LLama 3.1 405b	Good	Great	Good	Good	Poor
	LLama 3.1 70b	Good	Good	Poor	Poor	Poor