Loading

Best practices for Scout tests

This guide covers best practices for writing Scout UI and API tests that are reliable, maintainable, and fast.

Scout is built on Playwright, so the official Playwright Best Practices apply.

Tip

New to Scout? Start with our Scout introduction page.

Best practices that apply to both UI and API tests.

Scout is deployment-agnostic: write once, run locally and on Elastic Cloud.

  • Every suite must have deployment tags. Use tags to target the environments where your tests apply (for example, a feature that only exists in stateful deployments).
  • Within a test, avoid relying on configuration, data, or behavior specific to a single deployment. Test logic should produce the same result locally and on Cloud.
  • Run your tests against a real Elastic Cloud project before merging to catch environment-specific surprises early. See Run tests on Elastic Cloud for setup instructions.

When a feature is gated behind a flag, enable it at runtime with apiServices.core.settings() rather than creating a custom server config. Runtime flags work locally and on Cloud, don’t require a server restart, and avoid the CI cost of a dedicated server instance.

For the full guide (including when a custom server config is unavoidable), see Feature flags.

When you add new tests, fix flakes, or make significant changes, run the same tests multiple times to catch flakiness early. A good starting point is 20–50 runs.

Prefer doing this locally first (faster feedback), and use the Flaky Test Runner in CI when needed. See Debug flaky tests for guidance.

  • Keep one top-level suite per file (test.describe).
  • Avoid nested describe blocks. Use test.step for structure inside a test.
  • Don’t rely on test file execution order (it’s not guaranteed).

Test names should read like a sentence describing expected behavior. Clear names make failures self-explanatory and test suites scannable.

Prefer “one role + one flow per file” and keep spec files small (roughly 4–5 short tests or 2–3 longer ones). The test runner balances work at the spec-file level, so oversized files become bottlenecks during parallel execution. Put shared login/navigation in beforeEach.

If many files share the same “one-time” work (archives, API calls, settings), move it to a global setup hook.

It’s common for test suites to load Elasticsearch or Kibana archives that are barely used (or not used at all). Unused archives slow down setup, waste resources, and make it harder to understand what a test actually depends on. Check if your tests ingest the data they actually need.

Use esArchiver.loadIfNeeded(), which skips ingestion if the index and documents already exist (useful when multiple suites share the same data).

Cleanup in the test body doesn’t run after a failure. Prefer afterEach / afterAll.

Tests should be clean and declarative. If a helper might return an expected error (for example, 404 during cleanup), the helper should handle it internally, for example by accepting an ignoreErrors option or treating a 404 during deletion as a success.

If a value is reused across suites (archive paths, fixed time ranges, endpoints, common headers), extract it into a shared constants.ts file. This reduces duplication and typos, and makes updates safer.

Avoid admin unless there’s no alternative. Minimal permissions catch real permission bugs and keep tests realistic. Also test the forbidden path: verify that an under-privileged role receives 403 for endpoints it shouldn’t access.

See browser authentication and API authentication.


Best practices specific to UI tests.

Default to parallel UI suites when possible. Parallel workers share the same Kibana/ES deployment, but run in isolated Spaces.

Mode When to use
Parallel UI tests (most suites), suites that share pre-ingested data (often using the global setup hook)
Sequential API tests, suites that require a “clean” Elasticsearch state

UI tests should answer “does this feature work for the user?” Verify that components render, respond to interaction, and navigate correctly. Leave exact data validation (computed values, aggregation results, edge cases) to API or unit tests, which are faster and less brittle.

What you’re testing Recommended layer
User flows, navigation, rendering Scout UI test
Data correctness, API contracts, edge cases Scout API test
Isolated component logic (loading/error states, tooltips, field validation) RTL/Jest unit test

Use test.step() to structure a multi-step flow while keeping one browser context (faster, clearer reporting).

Setup/teardown using UI is slow and brittle. Prefer Kibana APIs and fixtures.

Playwright actions and web-first assertions already wait/retry. Don’t add redundant waits, and never use page.waitForTimeout() as it’s a hard sleep with no readiness signal and a common source of flakiness.

When an action triggers async UI work (navigation, saving, loading data), wait for the resulting state before your next step. This ensures the UI is ready and prevents flaky interactions with elements that haven’t rendered yet.

If an action fails, don't wrap it in a retry loop. Playwright already waits for actionability; repeated failures usually point to an app issue (unstable DOM, non-unique selectors, re-render bugs). Fix the component or make your waiting/locators explicit and stable.

Prefer stable data-test-subj attributes accessed using page.testSubj. If data-test-subj is missing, prefer adding one to source code. If that’s not possible, use getByRole inside a scoped container.

Scout configures Playwright timeouts (source). Prefer defaults.

  • Don’t override suite-level timeouts/retries with test.describe.configure() unless you have a strong reason.
  • If you increase a timeout for one operation, keep it well below the test timeout and leave a short rationale. An assertion timeout that exceeds the test timeout is ignored.
  • Time spent in hooks (beforeEach, afterEach) counts toward the test timeout. If setup is slow, the test itself may time out even though its assertions are fast.

Tables/maps/visualizations can appear before data is rendered. Prefer waiting on a component-specific “loaded” signal rather than global indicators like the Kibana chrome spinner (our data shows they are unreliable for confirming that a particular component has finished rendering).

Prefer existing page objects (and their methods) over rebuilding EUI interactions in test files.

Create methods for repeated flows (and make them wait for readiness).

Playwright creates a fresh browser context for each test, so there is no cached state to work around. Both page object methods and test code should be explicit about the action they perform, not defensive about the current state. Conditional flows (like "if modal is open, close it first") hide bugs, waste time, and make failures harder to understand.

Prefer explicit expect() in the test file so reviewers can see intent and failure modes. Also prefer expect() over manual boolean checks, as Playwright’s error output includes the locator, call log, and a clear message, which if/throw patterns lose.

When a test verifies multiple independent items (KPI tiles, chart counts, table columns), use expect.soft() so the test continues checking everything instead of stopping at the first failure. Playwright still fails the test at the end if any soft assertion failed.

If you must interact with EUI internals, use wrappers from Scout to keep that complexity out of tests.

Scout supports automated accessibility (a11y) scanning via page.checkA11y. Add checks at high-value points in your UI tests (landing pages, modals, flyouts, wizard steps) rather than on every interaction.

For the full guide (scoping, exclusions, handling pre-existing violations), see Accessibility testing.

If a page has onboarding/getting-started state, set localStorage before navigation.

If you build a helper that will benefit other tests, consider upstreaming it:

  • Reusable across many plugins/teams: contribute to @kbn/scout
  • Reusable but solution-scoped: contribute to the relevant solution Scout package
  • Plugin-specific: keep it in your plugin’s test/scout tree

For the full guidance, see Scout.


Best practices specific to API tests.

Use the right fixture for the right purpose:

Fixture Use for
apiClient The endpoint under test (with scoped credentials from API auth)
apiServices Setup/teardown and side effects
kbnClient, esClient, etc. Lower-level setup when apiServices doesn’t have a suitable helper

Prefer tests that read like “call endpoint X as role Y, assert outcome”.

This pattern validates both endpoint behavior and the permission model.

Status code assertions are necessary but not sufficient. Also validate shape and key fields.