gurock · acuanico-tr-galt · May 15, 2026 · May 5, 2026 · May 8, 2026 · May 8, 2026
diff --git a/CHANGELOG.MD b/CHANGELOG.MD
@@ -12,7 +12,9 @@ _released 04--2026
 
 ### Added
  - **AI Evaluation Template Support**: Uploading test result support for TestRail's AI Evaluation Template with multi-dimensional quality ratings. See README "AI Evaluation Template Support" section for complete examples.
+ - **Multi-Step AI Evaluation Workflows**: Support for combining step-level execution tracking (`testrail_result_step`) with overall quality ratings in AI Evaluation tests. See README "Multi-Step AI Evaluation Workflows" section.
  - **Global Quality Rating via `--result-fields`**: Added support for applying quality ratings to all test results using `--result-fields quality_rating:'{"category": value}'`. Test-specific quality ratings in XML/JSON properties take precedence over CLI global ratings.
+ - **Automatic AI Evaluation Template Detection**: When using `-y` (auto-creation mode), TRCLI now automatically detects and creates test cases with the AI Evaluation template. See README "Automatic Case Creation for AI Evaluation Template" section.
 
 ## [1.14.1]
 

diff --git a/README.md b/README.md
@@ -690,6 +690,235 @@ trcli parse_robot \
   --suite-id 100
 ```
 
+### Multi-Step AI Evaluation Workflows
+
+For complex AI systems with multiple pipeline stages (like RAG, multi-agent systems, or sequential AI workflows), you can combine **step-level execution tracking** with **overall quality assessment** in your AI Evaluation tests. quality_rating result field can be added to to Test Case (Steps)
+
+#### How It Works
+
+**Step-Level Tracking:**
+- Each step has its own **status** (passed, failed, skipped, untested)
+- See exactly where in the pipeline the failure occurred
+
+**Overall Quality Rating:**
+- One **quality_rating** applies to the entire test result 
+- Assess the final output quality across multiple dimensions
+
+#### JUnit XML Example
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<testsuites name="RAG Pipeline Tests" tests="1" failures="1" time="10.5">
+  <testsuite name="Document QA" tests="1" failures="1" time="10.5">
+
+    <testcase classname="ai.rag.DocumentQA" name="C1000_test_rag_pipeline" time="10.5">
+      <properties>
+        <property name="test_id" value="C1000"/>
+
+        <!-- Step-Level Execution Tracking -->
+        <property name="testrail_result_step" value="passed:Step 1 Query Understanding"/>
+        <property name="testrail_result_step" value="passed:Step 2 Document Retrieval"/>
+        <property name="testrail_result_step" value="failed:Step 3 Answer Generation"/>
+        <property name="testrail_result_step" value="untested:Step 4 Response Validation"/>
+
+        <!-- Overall Quality Rating -->
+        <property name="quality_rating" value='{"factual_accuracy": 2, "coherence": 3, "completeness": 1}'/>
+
+        <!-- AI Context Fields (not applicable to Test Case (Steps) -->
+        <property name="testrail_result_field" value="custom_ai_input:What programming language is used for machine learning?"/>
+        <property name="testrail_result_field" value="custom_ai_output:JavaScript is the primary language for machine learning."/>
+        <property name="testrail_result_field" value="custom_ai_traces:https://logs.example.com/trace/rag-001"/>
+        <property name="testrail_result_field" value="custom_ai_latency:10.5 seconds"/>
+      </properties>
+      <failure message="Answer generation produced factually incorrect response"/>
+    </testcase>
+
+  </testsuite>
+</testsuites>
+```
+
+**Upload Command:**
+```bash
+trcli parse_junit \
+  -f rag_pipeline_results.xml \
+  --project-id 1 \
+  --suite-id 100
+```
+
+#### Important Notes
+
+1. **Quality Rating Scope**: The `quality_rating` applies to the **entire test result**, not individual steps. It represents the overall quality of the AI system's final output.
+
+2. **Step Status Format**: Use `status:description` format for step-level tracking:
+   - `passed:Step 1 Query Understanding`
+   - `failed:Step 3 Answer Generation`
+   - `skipped:Optional Enhancement`
+   - `untested:Step 4 Response Validation`
+
+3. **Available Step Statuses**:
+   - `passed` (status_id: 1) - Step completed successfully
+   - `untested` (status_id: 3) - Step not executed
+   - `skipped` (status_id: 4) - Step intentionally skipped
+   - `failed` (status_id: 5) - Step failed
+
+4. **Test Status Aggregation**: The overall test status follows **fail-fast** logic - if any step fails, the entire test fails.
+
+### Automatic Case Creation for AI Evaluation Template
+
+When using the `-y` flag (auto-creation mode), TRCLI can automatically detect and create test cases with the **AI Evaluation template**. This eliminates the need to manually select templates or pre-create cases.
+
+#### How Auto-Detection Works
+
+TRCLI detects AI Evaluation indicators through three methods:
+
+1. **Quality Rating in Test Results**: When `quality_rating` is present in any test result
+2. **AI Case Fields in CLI**: When `--case-fields` includes `custom_ai_type` or `custom_ai_model`
+3. **AI Case Fields in XML Properties**: When `testrail_case_field` properties include AI fields
+
+If any of these indicators are detected, TRCLI will validate that the AI Evaluation template exists in your project or exit with an error if the template is not found.
+
+#### Example: Auto-Create with Quality Rating
+
+```bash
+trcli -y \
+  -h https://your-instance.testrail.io \
+  --project "AI Testing" \
+  -n \
+  --title "RAG Pipeline Tests" \
+  -f junit_results.xml
+```
+
+**junit_results.xml:**
+```xml
+<testsuites name="RAG Tests">
+  <testsuite name="Document QA">
+    <testcase name="test_rag_pipeline">
+      <properties>
+        <!-- Automation ID for case matching -->
+        <property name="test_id" value="ai.rag.test_rag_pipeline"/>
+
+        <!-- Quality rating triggers AI Evaluation template -->
+        <property name="quality_rating" value='{"factual_accuracy": 5, "coherence": 4}'/>
+
+        <!-- AI context fields for observability -->
+        <property name="testrail_result_field" value="custom_ai_input:What is ML?"/>
+        <property name="testrail_result_field" value="custom_ai_output:ML is a subset of AI..."/>
+      </properties>
+    </testcase>
+  </testsuite>
+</testsuites>
+```
+
+#### Example: Auto-Create with AI Case Fields
+
+You can specify AI case fields either via CLI or in XML properties:
+
+**Via CLI `--case-fields`:**
+```bash
+trcli -y \
+  -h https://your-instance.testrail.io \
+  --project "AI Testing" \
+  --case-fields custom_ai_type:1 custom_ai_model:2 \
+  -f test_results.xml
+```
+
+**Via XML Properties:**
+```xml
+<testcase name="test_llm_chatbot">
+  <properties>
+    <property name="test_id" value="ai.llm.test_chatbot"/>
+
+    <!-- AI case fields trigger AI Evaluation template -->
+    <!-- custom_ai_type: 1=RAG, 2=ML, 3=LLM -->
+    <property name="testrail_case_field" value="custom_ai_type:3"/>
+    <!-- custom_ai_model: 1=GPT-5, 2=Gemini 3, 3=Sonnet 3.5 -->
+    <property name="testrail_case_field" value="custom_ai_model:1"/>
+
+    <!-- Optional: Add quality rating -->
+    <property name="quality_rating" value='{"factual_accuracy": 4}'/>
+  </properties>
+</testcase>
+```
+
+#### AI Case Field Values
+
+The AI Evaluation template includes two dropdown case fields:
+
+**`custom_ai_type`** - Type of AI system:
+- `1` = RAG (Retrieval-Augmented Generation)
+- `2` = ML (Machine Learning)
+- `3` = LLM (Large Language Model)
+
+**`custom_ai_model`** - AI model used:
+- `1` = GPT-5
+- `2` = Gemini 3
+- `3` = Sonnet 3.5
+
+**Note:** Values must be integers (1-3), not strings.
+
+#### Combining Auto-Creation with Multi-Step Results
+
+Auto-creation works seamlessly with step-level results for Test Case (Steps) template. Simply include both `quality_rating` and `testrail_result_step` properties:
+
+```xml
+<testcase name="test_rag_full_pipeline">
+  <properties>
+    <property name="test_id" value="ai.rag.test_full_pipeline"/>
+
+    <!-- Step-level execution tracking -->
+    <property name="testrail_result_step" value="passed:Step 1 Query Understanding"/>
+    <property name="testrail_result_step" value="passed:Step 2 Vector Search"/>
+    <property name="testrail_result_step" value="failed:Step 3 Answer Generation"/>
+
+    <!-- Overall quality rating (applies to entire test) -->
+    <property name="quality_rating" value='{"factual_accuracy": 2, "coherence": 4}'/>
+
+    <!-- AI case fields for metadata -->
+    <property name="testrail_case_field" value="custom_ai_type:1"/>
+    <property name="testrail_case_field" value="custom_ai_model:3"/>
+  </properties>
+</testcase>
+```
+
+#### Template Validation
+
+Before creating cases, TRCLI validates that the AI Evaluation template exists in your project. If the template is not found, you'll see:
+
+```
+ERROR: Cannot auto-create cases with AI Evaluation template.
+AI Evaluation template not found in project (ID: 1).
+
+Please enable the AI Evaluation template in your TestRail project:
+1. Go to Administration > Customizations > Templates
+2. Enable 'AI Evaluation' template for your project
+```
+
+#### Robot Framework Support
+
+Robot Framework tests also support auto-creation with AI Evaluation template:
+
+```robot
+*** Test Cases ***
+Test RAG Pipeline
+    [Documentation]    - testrail_case_field:custom_ai_type:1
+    ...                - testrail_case_field:custom_ai_model:3
+    ...                - quality_rating:{"factual_accuracy": 5, "relevance": 4}
+    ...                - testrail_result_field:custom_ai_input:What is quantum computing?
+    ...                - testrail_result_field:custom_ai_output:Quantum computing uses...
+    [Tags]    ai-evaluation
+
+    # Test steps here
+    Should Be Equal    ${status}    success
+```
+
+#### Important Notes
+
+1. **Template Requirement**: The AI Evaluation template must be enabled in your TestRail project
+2. **Global vs. Test-Specific**: AI case fields can be specified globally via `--case-fields` or per-test via XML properties
+3. **Field Type**: AI case field values are dropdown IDs (integers 1-3), not strings
+4. **Detection Scope**: Detection checks ALL test cases in the file - if any test has AI indicators, ALL auto-created cases will use the AI Evaluation template
+5. **Compatible with BDD**: Auto-creation is NOT supported for BDD workflows (Cucumber/Gherkin), which have their own template assignment logic
+
 ## Behavior-Driven Development (BDD) Support
 
 The TestRail CLI provides comprehensive support for Behavior-Driven Development workflows using Gherkin syntax. The BDD features enable you to manage test cases written in Gherkin format, execute BDD tests with various frameworks (Cucumber, Behave, pytest-bdd, etc.), and seamlessly upload results to TestRail.