﻿---
title: Large language model performance matrix for Elastic Security
description: This page summarizes internal test results comparing large language models (LLMs) across Elastic Security AI chat and AI-powered feature use cases. These...
url: https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6960/solutions/security/ai/large-language-model-performance-matrix
products:
  - Elastic Cloud Serverless
  - Elastic Security
applies_to:
  - Serverless Security projects: Generally available
  - Elastic Stack: Generally available
---

# Large language model performance matrix for Elastic Security
This page summarizes internal test results comparing large language models (LLMs) across Elastic Security [AI chat](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6960/explore-analyze/ai-features/ai-chat-experiences) and AI-powered feature use cases. These ratings apply equally whether you're using [AI Assistant](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6960/solutions/security/ai/ai-assistant) or [Agent Builder](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6960/solutions/security/ai/agent-builder/agent-builder). To learn more about these use cases, refer to [AI-powered features](/elastic/docs-content/pull/6960/explore-analyze/ai-features#security-features).
<important>
  Higher scores indicate better performance. A score of 10 on a task means the model met or exceeded all task-specific benchmarks.Models with a score of "Not recommended" failed testing. This could be due to various issues, including context window constraints.
</important>


## Proprietary models

Models from third-party LLM providers.
**Scroll horizontally to view more information.**

| Model             | Alert Triage | Detection Engineering | Investigation | KB Retrieval | Workflow Execution | Overall |
|-------------------|--------------|-----------------------|---------------|--------------|--------------------|---------|
| Claude Sonnet 4.6 | 10           | 4.88                  | 6.44          | 6.26         | 10                 | 7.52    |
| Claude Opus 4.6   | 10           | 4.31                  | 6.58          | 6.41         | 9.71               | 7.4     |
| Gemini 3.1 Pro    | 10           | 4.69                  | 6.21          | 6.02         | 9.62               | 7.31    |
| GPT-5.4           | 10           | 4.41                  | 6.83          | 6.67         | 8.6                | 7.3     |
| Gemini 3.0 Flash  | 8.43         | 4.09                  | 5.71          | 5.49         | 9.14               | 6.57    |


## Open-source models

Models you can [deploy yourself](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6960/explore-analyze/ai-features/llm-guides/local-llms-overview).
**Scroll horizontally to view more information.**

| Model        | Alert Triage | Detection Engineering | Investigation | KB Retrieval | Workflow Execution | Overall |
|--------------|--------------|-----------------------|---------------|--------------|--------------------|---------|
| GPT OSS 120B | 7.31         | 1.81                  | 6.94          | 6.79         | 5.17               | 5.6     |