---
title: Large language model performance matrix for Elastic Security
description: This page summarizes internal test results comparing large language models (LLMs) across Elastic Security AI chat and AI-powered feature use cases. These...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/solutions/security/ai/large-language-model-performance-matrix
products:
  - Elastic Cloud Serverless
  - Elastic Security
applies_to:
  - Serverless Security projects: Generally available
  - Elastic Stack: Generally available
---

# Large language model performance matrix for Elastic Security
This page summarizes internal test results comparing large language models (LLMs) across Elastic Security [AI chat](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/ai-features/ai-chat-experiences) and AI-powered feature use cases. These ratings apply equally whether you're using [AI Assistant](https://www.elastic.co/elastic/docs-builder/docs/3016/solutions/security/ai/ai-assistant) or [Agent Builder](https://www.elastic.co/elastic/docs-builder/docs/3016/solutions/security/ai/agent-builder/agent-builder). To learn more about these use cases, refer to [AI-powered features](/elastic/docs-builder/docs/3016/explore-analyze/ai-features#security-features).
<important>
  Higher scores indicate better performance. A score of 10 on a task means the model met or exceeded all task-specific benchmarks.Models with a score of "Not recommended" failed testing. This could be due to various issues, including context window constraints.
</important>


## Proprietary models

Models from third-party LLM providers.

| **Model**            | **Alerts** | **Security Knowledge** | **ES|QL Query Generation** | **Knowledge Base Retrieval** | **Attack Discovery** | **Automatic Migration** | **Average Score** |
|----------------------|------------|------------------------|----------------------------|------------------------------|----------------------|-------------------------|-------------------|
| **Opus 4.6**         | 8.9        | 9.5                    | 8.5                        | 8.42                         | 8.7                  | 10                      | **9**             |
| **Sonnet 4.5**       | 8.6        | 7.6                    | 7.7                        | 7.23                         | 8                    | 10                      | **8.19**          |
| **Opus 4.5**         | 9          | 8.2                    | 7.5                        | 7.94                         | 8.5                  | 7.3                     | **8.07**          |
| **GPT 5.2**          | 8.6        | 6.6                    | 8                          | 6                            | 8.5                  | 10                      | **7.95**          |
| **Sonnet 4**         | 7.5        | 7.4                    | 8                          | 7.85                         | 7                    | 7.5                     | **7.54**          |
| **Sonnet 4.6**       | 9.3        | 9.5                    | 8.4                        | 7.45                         | Not recommended      | 10                      | **7.44**          |
| **Sonnet 3.7**       | 7.4        | 6.9                    | 6.1                        | 7.04                         | 7                    | 9.7                     | **7.36**          |
| **GPT 5.1**          | 9.3        | 4.3                    | 7.2                        | 6                            | 6.5                  | 9.8                     | **7.18**          |
| **GPT 4.1 Mini**     | 6.5        | 6.4                    | 6                          | 6.96                         | 4.5                  | 9.9                     | **6.71**          |
| **Gemini 2.5 Flash** | 7.8        | 6.2                    | 4.4                        | 5.71                         | 6                    | 9.81                    | **6.65**          |
| **Gemini 2.5 Pro**   | 8          | 5.6                    | 1.9                        | 5.3                          | 8.7                  | 6.3                     | **5.97**          |
| **GPT 4.1**          | 7.4        | 5.7                    | 4.4                        | 5.85                         | 8                    | 3.1                     | **5.74**          |


## Open-source models

Models you can [deploy yourself](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/ai-features/llm-guides/local-llms-overview).

| **Model**        | **Alerts** | **Security Knowledge** | **ES|QL Query Generation** | **Knowledge Base Retrieval** | **Attack Discovery** | **Automatic Migration** | **Average Score** |
|------------------|------------|------------------------|----------------------------|------------------------------|----------------------|-------------------------|-------------------|
| **GPT OSS 120B** | 7.6        | 3.7                    | 5.5                        | 6                            | 3.5                  | 9.7                     | **6**             |
| **GPT OSS 20b**  | 8.2        | 1.5                    | 2.5                        | Not recommended              | Not recommended      | Not recommended         | **2.03**          |