gurock · acuanico-tr-galt · May 5, 2026
diff --git a/CHANGELOG.MD b/CHANGELOG.MD
@@ -12,6 +12,7 @@ _released 04--2026
 
 ### Added
  - **AI Evaluation Template Support**: Uploading test result support for TestRail's AI Evaluation Template with multi-dimensional quality ratings. See README "AI Evaluation Template Support" section for complete examples.
+ - **Multi-Step AI Evaluation Workflows**: Support for combining step-level execution tracking (`testrail_result_step`) with overall quality ratings in AI Evaluation tests. See README "Multi-Step AI Evaluation Workflows" section.
  - **Global Quality Rating via `--result-fields`**: Added support for applying quality ratings to all test results using `--result-fields quality_rating:'{"category": value}'`. Test-specific quality ratings in XML/JSON properties take precedence over CLI global ratings.
 
 ## [1.14.1]

diff --git a/README.md b/README.md
@@ -690,6 +690,79 @@ trcli parse_robot \
   --suite-id 100
 ```
 
+### Multi-Step AI Evaluation Workflows
+
+For complex AI systems with multiple pipeline stages (like RAG, multi-agent systems, or sequential AI workflows), you can combine **step-level execution tracking** with **overall quality assessment** in your AI Evaluation tests. quality_rating result field can be added to to Test Case (Steps)
+
+#### How It Works
+
+**Step-Level Tracking:**
+- Each step has its own **status** (passed, failed, skipped, untested)
+- See exactly where in the pipeline the failure occurred
+
+**Overall Quality Rating:**
+- One **quality_rating** applies to the entire test result 
+- Assess the final output quality across multiple dimensions
+
+#### JUnit XML Example
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<testsuites name="RAG Pipeline Tests" tests="1" failures="1" time="10.5">
+  <testsuite name="Document QA" tests="1" failures="1" time="10.5">
+
+    <testcase classname="ai.rag.DocumentQA" name="C1000_test_rag_pipeline" time="10.5">
+      <properties>
+        <property name="test_id" value="C1000"/>
+
+        <!-- Step-Level Execution Tracking -->
+        <property name="testrail_result_step" value="passed:Step 1 Query Understanding"/>
+        <property name="testrail_result_step" value="passed:Step 2 Document Retrieval"/>
+        <property name="testrail_result_step" value="failed:Step 3 Answer Generation"/>
+        <property name="testrail_result_step" value="untested:Step 4 Response Validation"/>
+
+        <!-- Overall Quality Rating -->
+        <property name="quality_rating" value='{"factual_accuracy": 2, "coherence": 3, "completeness": 1}'/>
+
+        <!-- AI Context Fields (not applicable to Test Case (Steps) -->
+        <property name="testrail_result_field" value="custom_ai_input:What programming language is used for machine learning?"/>
+        <property name="testrail_result_field" value="custom_ai_output:JavaScript is the primary language for machine learning."/>
+        <property name="testrail_result_field" value="custom_ai_traces:https://logs.example.com/trace/rag-001"/>
+        <property name="testrail_result_field" value="custom_ai_latency:10.5 seconds"/>
+      </properties>
+      <failure message="Answer generation produced factually incorrect response"/>
+    </testcase>
+
+  </testsuite>
+</testsuites>
+```
+
+**Upload Command:**
+```bash
+trcli parse_junit \
+  -f rag_pipeline_results.xml \
+  --project-id 1 \
+  --suite-id 100
+```
+
+#### Important Notes
+
+1. **Quality Rating Scope**: The `quality_rating` applies to the **entire test result**, not individual steps. It represents the overall quality of the AI system's final output.
+
+2. **Step Status Format**: Use `status:description` format for step-level tracking:
+   - `passed:Step 1 Query Understanding`
+   - `failed:Step 3 Answer Generation`
+   - `skipped:Optional Enhancement`
+   - `untested:Step 4 Response Validation`
+
+3. **Available Step Statuses**:
+   - `passed` (status_id: 1) - Step completed successfully
+   - `untested` (status_id: 3) - Step not executed
+   - `skipped` (status_id: 4) - Step intentionally skipped
+   - `failed` (status_id: 5) - Step failed
+
+4. **Test Status Aggregation**: The overall test status follows **fail-fast** logic - if any step fails, the entire test fails.
+
 ## Behavior-Driven Development (BDD) Support
 
 The TestRail CLI provides comprehensive support for Behavior-Driven Development workflows using Gherkin syntax. The BDD features enable you to manage test cases written in Gherkin format, execute BDD tests with various frameworks (Cucumber, Behave, pytest-bdd, etc.), and seamlessly upload results to TestRail.

diff --git a/tests/test_data/XML/sample_ai_eval_multistep_workflow.xml b/tests/test_data/XML/sample_ai_eval_multistep_workflow.xml
@@ -0,0 +1,90 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<testsuites name="RAG Pipeline - AI Evaluation" tests="3" failures="2" errors="0" time="35.2">
+
+  <!-- Suite 1: Document QA Tests -->
+  <testsuite name="Document QA RAG Pipeline" tests="3" failures="2" errors="0" time="35.2">
+
+    <!-- Test 1: Successful RAG Pipeline (All Steps Pass) -->
+    <testcase classname="ai.rag.DocumentQA" name="C1000_test_rag_pipeline_success" time="12.5">
+      <properties>
+        <property name="test_id" value="C1000"/>
+
+        <!-- Step-Level Execution Tracking -->
+        <property name="testrail_result_step" value="passed:Step 1 Query Understanding"/>
+        <property name="testrail_result_step" value="passed:Step 2 Document Retrieval"/>
+        <property name="testrail_result_step" value="passed:Step 3 Answer Generation"/>
+        <property name="testrail_result_step" value="passed:Step 4 Response Validation"/>
+
+        <!-- Overall Quality Rating (High Quality) -->
+        <property name="quality_rating" value='{"factual_accuracy": 5, "coherence": 5, "completeness": 4, "relevance": 5}'/>
+
+        <!-- AI Context Fields -->
+        <property name="testrail_result_field" value="custom_ai_input:What is the capital of France?"/>
+        <property name="testrail_result_field" value="custom_ai_output:The capital of France is Paris. Paris is the largest city in France and has been the capital since 987 AD."/>
+        <property name="testrail_result_field" value="custom_ai_traces:https://logs.example.com/trace/rag-success-001"/>
+        <property name="testrail_result_field" value="custom_ai_latency:12.5 seconds"/>
+      </properties>
+    </testcase>
+
+    <!-- Test 2: Failed Answer Generation (Step 3 Fails) -->
+    <testcase classname="ai.rag.DocumentQA" name="C1001_test_rag_pipeline_factual_error" time="10.5">
+      <properties>
+        <property name="test_id" value="C1001"/>
+
+        <!-- Step-Level Execution Tracking -->
+        <property name="testrail_result_step" value="passed:Step 1 Query Understanding"/>
+        <property name="testrail_result_step" value="passed:Step 2 Document Retrieval"/>
+        <property name="testrail_result_step" value="failed:Step 3 Answer Generation"/>
+        <property name="testrail_result_step" value="untested:Step 4 Response Validation"/>
+
+        <!-- Overall Quality Rating (Low Due to Factual Error) -->
+        <property name="quality_rating" value='{"factual_accuracy": 1, "coherence": 3, "completeness": 2, "relevance": 2}'/>
+
+        <!-- AI Context Fields -->
+        <property name="testrail_result_field" value="custom_ai_input:What programming language is primarily used for machine learning?"/>
+        <property name="testrail_result_field" value="custom_ai_output:JavaScript is the primary language for machine learning, widely used in neural networks and deep learning."/>
+        <property name="testrail_result_field" value="custom_ai_traces:https://logs.example.com/trace/rag-failure-001"/>
+        <property name="testrail_result_field" value="custom_ai_latency:10.5 seconds"/>
+      </properties>
+      <failure message="Answer generation produced factually incorrect response">
+        Expected: Python is the primary language for machine learning
+        Actual: JavaScript is the primary language for machine learning
+
+        Issue: Model hallucinated incorrect information despite correct document retrieval
+        Impact: Users receive misleading information that could affect decision-making
+      </failure>
+    </testcase>
+
+    <!-- Test 3: Document Retrieval Failure (Step 2 Fails) -->
+    <testcase classname="ai.rag.DocumentQA" name="C1002_test_rag_pipeline_retrieval_failure" time="12.2">
+      <properties>
+        <property name="test_id" value="C1002"/>
+
+        <!-- Step-Level Execution Tracking -->
+        <property name="testrail_result_step" value="passed:Step 1 Query Understanding"/>
+        <property name="testrail_result_step" value="failed:Step 2 Document Retrieval"/>
+        <property name="testrail_result_step" value="untested:Step 3 Answer Generation"/>
+        <property name="testrail_result_step" value="untested:Step 4 Response Validation"/>
+
+        <!-- Overall Quality Rating (Low Due to No Relevant Documents) -->
+        <property name="quality_rating" value='{"factual_accuracy": 0, "coherence": 1, "completeness": 0, "relevance": 1}'/>
+
+        <!-- AI Context Fields -->
+        <property name="testrail_result_field" value="custom_ai_input:Explain the Heisenberg uncertainty principle in quantum mechanics"/>
+        <property name="testrail_result_field" value="custom_ai_output:I don't have enough information to answer your question about the Heisenberg uncertainty principle."/>
+        <property name="testrail_result_field" value="custom_ai_traces:https://logs.example.com/trace/rag-retrieval-failure-001"/>
+        <property name="testrail_result_field" value="custom_ai_latency:12.2 seconds"/>
+      </properties>
+      <failure message="Document retrieval failed to find relevant sources">
+        Expected: Retrieved at least 3 relevant documents about quantum mechanics
+        Actual: Retrieved 0 relevant documents (only found documents about classical physics)
+
+        Issue: Vector search embeddings failed to capture semantic meaning of quantum mechanics query
+        Impact: System cannot provide accurate answers for domain-specific questions
+        Recommendation: Retrain embedding model with physics-domain knowledge or use specialized vector database
+      </failure>
+    </testcase>
+
+  </testsuite>
+
+</testsuites>