Build software better, together

relai-ai / relai-sdk

Star

A platform for building reliable AI agents

ai-agents ai-reliability

Updated Apr 3, 2026
Python

najeed / ai-agent-eval-harness

Star

The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.

Updated May 15, 2026
Python

vishwanathakuthota / openvals

Star

Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

machine-learning gemini openai ai-safety ai-agents ai-evaluation ai-testing ai-quality llm-tools ollama llm-benchmarking ai-evaluation-framework calude ai-reliability vishwanath-akuthota

Updated May 16, 2026
Python

The "Cloudflare for AI Agents". 7-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.

Updated May 4, 2026
Python

elsium-ai / elsium-ai

Star

Production-grade TypeScript AI runtime focused on reliability, governance, and reproducible LLM systems. Multi-provider gateway, agents, RAG, workflows, policy engine, audit trails, and deterministic testing — built for teams shipping AI in production.

typescript ai-framework rag agent-framework ai-compliance llm ai-governance ai-runtime open-source-ai ai-infrastructure llm-gateway reproducible-ai llm-runtime ai-reliability deterministic-ai ai-production

Updated May 13, 2026
TypeScript

ejentum / ejentum-mcp

Star

MCP server for the Ejentum Logic API. Exposes the four cognitive harnesses (reasoning, code, anti-deception, memory) as MCP tools any agentic client can call.

typescript mcp code-review claude anthropic llm-tools agentic-ai model-context-protocol mcp-server ai-reliability reasoning-harness ejentum anti-deception cognitive-scaffold

Updated May 17, 2026
JavaScript

exospherehost / ai-reliability-standards

Star

Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).

enterprise ai reliability-engineering evaluation sre observability ai-agents aiops evals durable-execution ai-reliability

Updated Feb 14, 2026
Dockerfile

vbepipe / vmrrb-benchmark

Star

Benchmark for evaluating advanced reasoning, recursive dependency resolution, and robustness capabilities of large language models in dynamic, noisy, and structurally challenging environments.

benchmark dependency-resolution ai-accuracy multistep-reasoning ai-evaluation large-language-models ai-reasoning llm-evaluation reasoning-benchmark llm-benchmark ai-reliability recursive-reasoning ai-stability long-chain-reasoning

Updated May 15, 2026
Python

AionSystem / VERITAS

Sponsor

Star

Sheldon K. Salmon — AI Reliability Architect. Creator of the AION Constitutional Stack and the CERTUS certainty‑engineering methodology. He designed, directed, and red‑teamed VERITAS — applying epistemic scoring, Uncertainty Mass, and permanent STP seals to community crisis data. Code is open source. The judgment is not.

community ai humanitarian photo undp crisis-support disaster-response disaster-relief damage-assessment crisis-response ai-audit ai-reliability aion-system sheldon-k-salmon fsve certus-engine

Updated May 16, 2026
JavaScript

AionSystem / AION-SCAFFOLDING

Sponsor

Star

AION Scaffold — Intelligent tree-to-filesystem generator. Built by Sheldon K. Salmon, AI Reliability Architect. Part of the AION Constitutional Stack. Free forever. No tracking.

tools scaffold dev developer-tools scaffolder scaffolding dev-tools ai-architect ai-audit ai-reliability aion-system

Updated May 6, 2026
HTML

aditikhare007 / ai-evaluation-trust-lens

Star

AI evaluation and reliability system for reasoning about trust, performance, and failure modes in AI systems | AditiKhare.com — AI Product Ecosystem

model-evaluation ai-safety mlops ai-systems trustworthy-ai decision-intelligence ai-evaluation generative-ai llm-evaluation agentic-ai ai-architecture ai-reliability

Updated Apr 20, 2026

aditikhare007 / ai-decision-intelligence-system

Star

Enterprise AI system for decision intelligence — transforming research into scalable, context-aware insights at production scale | AditiKhare.com — AI Product Ecosystem

ai mlops ai-systems inference-optimization ai-platform ai-product decision-intelligence ai-evaluation llm generative-ai context-engineering ai-reliability

Updated Apr 20, 2026

meunier-jc / authentic-fluency

Star

A behavioral framework opposing native fluency to authentic fluency — the structural tension RLHF creates and Claude Mythos Preview makes urgent.

ai-safety claude-ai sycophancy human-ai-collaboration llm-alignment ai-reliability behavioral-framework truth-before-fluency integrity-before-agreement coexistence-through-reliability reliability-or-obsolescence existential-co-regulation authentic-fluency hallucinatory-recursive-embedding ai-alignment-case-study native-fluency rlhf-critique

Updated May 5, 2026

amulbham / cognitive-team-architecture

Star

A multi-agent cognitive architecture solving the LLM state-dependency problem with persistent memory and a mandatory self-correction loop. An architecture that is built on a more profound and biologically resonant principle: memory is an active component of intelligence itself.

persistent-memory multi-agent-systems cognitive-architecture self-correction llm prompt-engineering ai-reliability

Updated Aug 23, 2025

Nefza99 / Rebis-AI-auditing-Architecture

Star

Orchestration runtime for AI agent workflows that preserves task-state fidelity, prevents reasoning drift, and reduces wasted computation in long-horizon pipelines.

Updated Mar 19, 2026
JavaScript

dhirajxai / prompt-engineering-foundations

Star

Developer-first prompt engineering patterns for grounded, testable, and reliable AI outputs.

developer-tools grounded-generation llm prompt-engineering prompting ai-workflows ai-reliability hallucination-reduction

Updated Mar 27, 2026

thegeekajay / WorkflowBench

Star

Lightweight benchmark harness for AI-driven business workflows

python cli benchmark openai workflow-automation ai-evaluation llm prompt-engineering anthropic ai-reliability

Updated Apr 24, 2026
HTML

LivingFramework / LC-OS

Star

Research archive — eight published papers, Mahdi Ledger, and empirical foundations of the LC-OS governance framework.

ai-safety ai-research prompt-engineering ai-governance llm-framework human-ai-collaboration context-engineering ai-reliability

Updated May 9, 2026

hermes-labs-ai / hermeneutic

Star

Mine corrections from AI chat logs. Gate the next response before drift ships.

llm-evaluation ai-audit ai-reliability hermes-labs ai-overclaim retrieval-scaffold

Updated Apr 30, 2026
Python

UAICP / uaicp

Star

UAICP (Universal Agentic Interoperability Control Protocol): open reliability contract for AI agent workflows with evidence gating, policy controls, and auditability.

python rust typescript orchestration interoperability compliance risk-management policy-engine ai-agents audit-trail autogen guardrails ai-governance llmops langgraph agentic-ai ai-reliability uaicp rig-rs

Updated Feb 27, 2026
TypeScript

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-reliability

Here are 57 public repositories matching this topic...

relai-ai / relai-sdk

najeed / ai-agent-eval-harness

vishwanathakuthota / openvals

Harshit-J004 / toolguard

elsium-ai / elsium-ai

ejentum / ejentum-mcp

exospherehost / ai-reliability-standards

vbepipe / vmrrb-benchmark

AionSystem / VERITAS

AionSystem / AION-SCAFFOLDING

aditikhare007 / ai-evaluation-trust-lens

aditikhare007 / ai-decision-intelligence-system

meunier-jc / authentic-fluency

amulbham / cognitive-team-architecture

Nefza99 / Rebis-AI-auditing-Architecture

dhirajxai / prompt-engineering-foundations

thegeekajay / WorkflowBench

LivingFramework / LC-OS

hermes-labs-ai / hermeneutic

UAICP / uaicp

Improve this page

Add this topic to your repo