-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Description
In the test environment, running kubectl get pods -n docker or kubectl get all -n docker sometimes returns:
No resources found in docker namespace.
The detection agent interprets this as a K8s issue and marks it as such.
However, in the production env this namespace always contains resources, so this case should not be treated as an issue. This leads to false positives and inflates detection accuracy metrics.
Trace
"agent": "openrouter",
"session_id": "292a425f-0b28-45f5-96bc-e4f02fa42623",
"problem_id": "flower_model_misconfig-detection",
"start_time": 1756135764.7457962,
"end_time": 1756135765.7294734,
"trace": [
{
"role": "assistant",
"content": "```\nexec_shell(\"kubectl get pods -n docker\")\n```"
},
{
"role": "env",
"content": "No resources found in docker namespace.\n"
},
{
"role": "assistant",
"content": "```\nexec_shell(\"kubectl get all -n docker\")\n```"
},
{
"role": "env",
"content": "No resources found in docker namespace.\n"
},
{
"role": "assistant",
"content": "```\nsubmit(\"Yes\")\n```"
},
{
"role": "env",
"content": "1"
}
],
"results": {
"Detection Accuracy": "Correct",
"TTD": 0.9836771488189697,
"steps": 3,
"in_tokens": 15,
"out_tokens": 33
}
Open Question
Would it make sense to introduce a supervisor agent to validate conversations before marking them as issues?
The idea is that if no actual problem is evidenced, the supervisor would return NoIssue or Inconclusive instead of allowing a false positive.
One option could be adding a lightweight GPT-5–based supervisor to evaluate detection tasks.
Any suggestions or alternative approaches are welcome. 🤔