audio-understanding

Voxtral is a state-of-the-art model developed to handle both speech transcription and audio understanding with remarkable accuracy and efficiency. This demo interface lets you run the Voxtral model on powerful GPUs to evaluate its performance and see how it can be used for transcription and deeper analysis.

audio speech speech-recognition audio-understanding voxtral speech-understanding

Updated Jul 26, 2025
Python

ZXXZ1000 / LATRACE-AI

Star

Open-source AI memory engine for agents: multimodal memory, temporal knowledge graphs, graph RAG, and evidence-backed long-term context.

neo4j knowledge-graph video-understanding ai-agents memory-engine long-term-memory rag qdrant tenant-isolation graph-rag ai-memory audio-understanding agent-memory memory-service multimodal-memory

Updated May 8, 2026
Python

danielrosehill / Audio-Multimodal-AI-Resources

Star

A compilation of resources (model profiles, benchmarks, docs) for multimodal AI models with audio understanding (esp. focused on ASR and transcription use-cases)

stt asr audio-understanding multimodal-ai audio-multimodal audio-text-to-text

Updated Dec 8, 2025

danielrosehill / Audio-Understanding-Bitrate-Eval-0426

Star

Empirical eval: how MP3 bitrate affects transcription accuracy across every audio-input LLM on OpenRouter (Gemini, GPT-Audio, Voxtral, MiMo). April 2026.

multimodal audio-understanding

Updated Apr 17, 2026
Python

danielrosehill / Gemini-31-Lite-Audio-Understanding-Eval

Star

One voice recording for testing with TTS/cloning

multimodal ai-experiments audio-understanding

Updated Mar 26, 2026
Python

api-evangelist / gemini

Star

Google's Gemini API provides access to state-of-the-art generative AI models for text generation, multimodal understanding, code generation, and more.

machine-learning text-to-speech embeddings artificial-intelligence image-generation batch-processing agents video-understanding structured-output multimodal video-generation document-understanding apis-json large-language-models generative-ai function-calling audio-understanding deep-research naftiko

Updated May 20, 2026

pro6692abou / llm-audio

Star

Provide Whisper-based audio transcription and translation with lightweight C++ libraries for easy integration into LLM projects.

text music-information-retrieval neural-networks speech-to-text text-to-image music-ai large-language-models foundational-models speech-ai vision-language-model audio-language large-vision-language-models large-audio-models speech-llms audio-understanding

Updated May 20, 2026
C++

Improve this page

Add a description, image, and links to the audio-understanding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the audio-understanding topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio-understanding

Here are 12 public repositories matching this topic...

FunAudioLLM / Fun-ASR

AudioLLMs / Awesome-Audio-LLM

HKUDS / VideoAgent

NKU-HLT / DIFFA

JHU-LCAP / FlexSED

khalooei / Voxtral-AI-Demo-Local-Interface

ZXXZ1000 / LATRACE-AI

danielrosehill / Audio-Multimodal-AI-Resources

danielrosehill / Audio-Understanding-Bitrate-Eval-0426

danielrosehill / Gemini-31-Lite-Audio-Understanding-Eval

api-evangelist / gemini

pro6692abou / llm-audio

Improve this page

Add this topic to your repo