Windmill is a boutique digital product delivery company. Our team of designers, strategists, and engineers love to create great experiences. We design and develop delightful and functional digital products that solve tough problems and enable new opportunities for enterprises in complex industries such as banking & finance, healthcare, and compliance.
19 січня 2026

AI Evaluation & Verification QA Engineer (вакансія неактивна)

Краків (Польща)

Windmill is a boutique digital product delivery company. Our team of designers, strategists and engineers love to create great experiences. We design and develop delightful and functional digital products that solve tough problems and enable new opportunities for enterprises in complex industries, such as banking & finance, healthcare and compliance.

For more information, please check the company website at windmill.digital

Role Overview

We are looking for a QA-origin engineer with a strong background in manual testing and a growing or established focus on AI evaluation, verification, and validation.
This role is dedicated to ensuring the correctness, reliability, safety, and consistency of AI-driven and automation-heavy systems.
The ideal candidate comes from a traditional QA background, understands system behavior deeply, and is either experienced in or actively transitioning toward modern AI evaluation, QA Vibe, and next-generation automation technologies.

Key Responsibilities
Design and execute evaluation strategies for AI-powered features (LLM-based flows, recommendations, decision logic).
Validate AI outputs for correctness, consistency, bias, hallucination risk, and edge cases.
Define qualitative and quantitative AI evaluation criteria, metrics, and acceptance thresholds.
Perform deep manual, exploratory, and scenario-based testing on AI and non-AI features.
Build and maintain automated evaluation and regression pipelines for AI-enabled systems.
Collaborate with engineers and product teams to improve AI testability and observability.
Review and verify fixes, prevent regressions, and ensure enterprise-grade quality before releases.

Required Background
Strong QA foundation with hands-on manual testing experience.
3–5+ years in QA / software testing, including complex or enterprise systems.
Experience defining test strategies, test plans, and acceptance criteria.
Understanding of AI system behavior, non-determinism, and evaluation challenges.
Exposure to test automation frameworks (UI and/or API).
Strong analytical skills and attention to detail.

Preferred
Experience with AI evaluation frameworks, prompt testing, or model validatio
Familiarity with Playwright, Cypress, Selenium, or similar tools.
Interest in agentic QA, AI-assisted testing, and QA automation evolution.

We offer:
Flexible and fast-paced environment where the product comes first Competitive salary
Challenges and personal growth
Flexible working practices
Friendly environment
Training program allowance
Team-building activities
Respectful and inclusive environment