Releases: automatika-robotics/embodied-agents
0.4.3
🚀 EmbodiedAgents v0.4.3
EmbodiedAgents v0.4.3 introduces a modernized, web-based fully dynamic UI experience, enhanced visual and multimodal interaction capabilities, and numerous stability improvements across LLM/VLM components and the vision pipeline.
✨ New Features
Dynamically generated UI for Agents
- Extended Sugarcoat's dynamic web UI elements for Agents so each recipe can create its own interaction client.
- Added dynamic UI callbacks for:
- RGBD images
- Detections and DetectionsMultiSource messages
- Points of interest messages
- Added UI element definitions for custom message types (extensible by sugar-derived packages).
- Added logging output from StreamingString directly in the web UI.
- Added utility for drawing images with bounding boxes for visualization of detections.
Components & Message Handling
- Added Detection2D as an allowed input in the map encoding component.
- Added callbacks for Detections2D and integrated their handling in LLM/VLM components.
- Streamlined naming conventions for detection and tracking message types.
- Added handling of additional custom message types from sugar-derived packages in agent components.
- Added support for publishing single detection or tracking messages from the vision component.
- Added UDP streaming to IP:PORT as an option in the TextToSpeech component when
play_on_deviceis enabled.
LLM / MLLM Enhancements
- Added alias
VLMfor the MLLM component. - Added warning messages when streaming is enabled without using the proper StreamingString message type.
- Added StreamingString message type for managing text/audio streams in external clients.
🧰 Fixes & Improvements
- Fixed image publishing in detection messages and empty image inputs in
Detection2Dpublications. - Fixed raw message data access during execution to avoid redundant output calls.
- Fixed topic passing from sugar-derived packages to agent components.
- Fixed websocket receiving issues in Text-to-Speech.
- Fixed keyword argument mismatches in detection and tracking message publishing.
- Improved refactoring of callbacks for video and RGBD message types.
- Added detections as handled output in the VLM component.
- Removed StreamingString as an input option for LLM/VLM components to streamline behavior.
⚙️ Chores & Maintenance
- Added websockets as an explicit dependency.
- Added warnings for template usage with StreamingString.
- Removed tiny web client and Chainlit-based client. Replaced with dynamic UIs for every recipe.
📚 Documentation Updates
- Added instructions for using the new dynamic web UI.
- Updated examples and guides for the new web-based client.
- Updated planning model documentation.
- Added a recipe for vision-guided point navigation.
🧩 Summary
This release completes the transition to a dynamic, ROS-integrated web interface for EmbodiedAgents, offering a more flexible and interactive experience for managing Physical AI agents.
It also enhances support for visual grounding, audio streams, and extended message types, and various bug fixes and improvements.
Full Changelog: 0.4.1...0.4.3
📖 Dive Deeper
Ready to explore these new features? Check out our updated documentation for detailed information, examples, and installation guide.
0.4.1
🚀 EmbodiedAgents 0.4.1 Release: Faster, More Flexible, and Ready for Anything!
We're thrilled to announce the release of EmbodiedAgents 0.4.1! This version marks a significant leap forward, bringing substantial performance improvements, enhanced flexibility for integrating various models (any server that is OpenAI API compatible), and a refined user experience.
✨ What's New?
This release is packed with exciting new features designed to make development smoother and your agents smarter:
- Project Renamed! We've officially rebranded from "ROS Agents" to EmbodiedAgents to better reflect the package's purpose and capabilities.
- Blazing Fast Streaming: Experience significantly faster handling of model output! We've added streaming support to LLM, MLLM, STT (Speech-to-Text), and TTS (Text-to-Speech) components, making your agents more responsive than ever.
- Onboard Object Detection: The Vision component now includes a local classifier model for efficient, onboard object detection.
- Universal Model Integration: Integrate with any model serving platform that offers an OpenAI-style API using our new generic HTTP client.
- Planning with Vision-Language Models: We've added support for planning MLLM models, starting with RoboBrain2.0 by BAAI, which allows for task based outputs from MLLM component (e.g. grounding, pointing etc.), opening the door to multimodal planning and perception-driven action.
- Enhanced Ollama Support: The OllamaClient now boasts embeddings model support, expanding your options for local model inference.
- Direct ChromaDB Integration: We've added a ChromaDB client for direct interaction with a ChromaDB server, supporting embeddings models served via Ollama or
sentence-transformers. No extra dependencies needed! - Simplified & Streamlined: We've reduced external Python package dependencies and simplified models and clients for a cleaner, more efficient codebase.
- Python 3.8 Compatibility Fixes: We've addressed backward compatibility issues to ensure smoother installations for Python 3.8 users.
- Dynamic System Prompts: The LLM/MLLM components now support
set_system_prompt, allowing for dynamic system prompt configuration without modifying model settings.
🛠️ Under the Hood Improvements
Beyond the big features, we've also implemented a host of improvements and fixes:
- Improved connection error messages and updated installation instructions for a smoother onboarding experience.
- Added a Debian packaging workflow for easier deployment.
- Introduced Ollama-specific inference options for better control over your Ollama models.
- Integrated the MeloTTS model and enhanced the Text-to-Speech component with a "say text" method for event-driven invocation.
- Implemented streaming playback for input in the Speech-to-Text component.
- Refactored components to remove
sounddeviceas a dependency for Text-to-Speech, simplifying setup. - Added warnings for local models when GPU is set but runtime isn't available.
- Enhanced the Speech-to-Text component with a hypothesis buffer for confirmed transcripts and asynchronous receiving for streaming websockets.
- Improved asynchronous publishing of responses in the LLM component when streaming.
- Ensured smoother termination by marking child threads as daemons.
- Added a
break_characterto the LLM component config for chunking streaming output. - Introduced support for RGBD messages (in Realsense style).
For a comprehensive list of all changes, please refer to the [full changelog](0.3.3...0.4.0).
📖 Dive Deeper
Ready to explore these new features? Check out our updated documentation for detailed information, examples, and installation guide.
0.3.3
0.3.1
Synopsis
This release adds significant updates to allow components to be launched as separate processes. This makes the system more robust to component level failures. To learn more about how to make your embodied agent graph robust and fault tolerant, check out this example.
If you are following the install from source procedure provided in the README, be sure to use the latest release of ROS Sugar.
NB: Release tags will be added in a new format starting from this release. The preceding 'v' has been removed.
What's Changed
- This patch adds support for using tool calling in LLM components when doing multiprocess execution. Tool calling is supported with OllamaClient. To learn more about it, check out this example.
0.3.0
Synopsis
This release adds significant updates to allow components to be launched as separate processes. This makes the system more robust to component level failures. To learn more about how to make your embodied agent graph robust and fault tolerant, check out this example.
If you are following the install from source procedure provided in the README, be sure to use the latest release of ROS Sugar.
NB: Release tags will be added in a new format starting from this release. The preceding 'v' has been removed.