07 Nov 09:46

aleph-ra

4e5f96c

0.4.3 Latest

Latest

🚀 EmbodiedAgents v0.4.3

EmbodiedAgents v0.4.3 introduces a modernized, web-based fully dynamic UI experience, enhanced visual and multimodal interaction capabilities, and numerous stability improvements across LLM/VLM components and the vision pipeline.

✨ New Features

Dynamically generated UI for Agents

Extended Sugarcoat's dynamic web UI elements for Agents so each recipe can create its own interaction client.
Added dynamic UI callbacks for:
- RGBD images
- Detections and DetectionsMultiSource messages
- Points of interest messages
Added UI element definitions for custom message types (extensible by sugar-derived packages).
Added logging output from StreamingString directly in the web UI.
Added utility for drawing images with bounding boxes for visualization of detections.

Components & Message Handling

Added Detection2D as an allowed input in the map encoding component.
Added callbacks for Detections2D and integrated their handling in LLM/VLM components.
Streamlined naming conventions for detection and tracking message types.
Added handling of additional custom message types from sugar-derived packages in agent components.
Added support for publishing single detection or tracking messages from the vision component.
Added UDP streaming to IP:PORT as an option in the TextToSpeech component when play_on_device is enabled.

LLM / MLLM Enhancements

Added alias VLM for the MLLM component.
Added warning messages when streaming is enabled without using the proper StreamingString message type.
Added StreamingString message type for managing text/audio streams in external clients.

🧰 Fixes & Improvements

Fixed image publishing in detection messages and empty image inputs in Detection2D publications.
Fixed raw message data access during execution to avoid redundant output calls.
Fixed topic passing from sugar-derived packages to agent components.
Fixed websocket receiving issues in Text-to-Speech.
Fixed keyword argument mismatches in detection and tracking message publishing.
Improved refactoring of callbacks for video and RGBD message types.
Added detections as handled output in the VLM component.
Removed StreamingString as an input option for LLM/VLM components to streamline behavior.

⚙️ Chores & Maintenance

Added websockets as an explicit dependency.
Added warnings for template usage with StreamingString.
Removed tiny web client and Chainlit-based client. Replaced with dynamic UIs for every recipe.

📚 Documentation Updates

Added instructions for using the new dynamic web UI.
Updated examples and guides for the new web-based client.
Updated planning model documentation.
Added a recipe for vision-guided point navigation.

🧩 Summary

This release completes the transition to a dynamic, ROS-integrated web interface for EmbodiedAgents, offering a more flexible and interactive experience for managing Physical AI agents.
It also enhances support for visual grounding, audio streams, and extended message types, and various bug fixes and improvements.

Full Changelog: 0.4.1...0.4.3

📖 Dive Deeper

Ready to explore these new features? Check out our updated documentation for detailed information, examples, and installation guide.

Assets 8

18 Jun 15:26

aleph-ra

0.4.1

a3d2cb5

0.4.1

🚀 EmbodiedAgents 0.4.1 Release: Faster, More Flexible, and Ready for Anything!

We're thrilled to announce the release of EmbodiedAgents 0.4.1! This version marks a significant leap forward, bringing substantial performance improvements, enhanced flexibility for integrating various models (any server that is OpenAI API compatible), and a refined user experience.

✨ What's New?

This release is packed with exciting new features designed to make development smoother and your agents smarter:

Project Renamed! We've officially rebranded from "ROS Agents" to EmbodiedAgents to better reflect the package's purpose and capabilities.
Blazing Fast Streaming: Experience significantly faster handling of model output! We've added streaming support to LLM, MLLM, STT (Speech-to-Text), and TTS (Text-to-Speech) components, making your agents more responsive than ever.
Onboard Object Detection: The Vision component now includes a local classifier model for efficient, onboard object detection.
Universal Model Integration: Integrate with any model serving platform that offers an OpenAI-style API using our new generic HTTP client.
Planning with Vision-Language Models: We've added support for planning MLLM models, starting with RoboBrain2.0 by BAAI, which allows for task based outputs from MLLM component (e.g. grounding, pointing etc.), opening the door to multimodal planning and perception-driven action.
Enhanced Ollama Support: The OllamaClient now boasts embeddings model support, expanding your options for local model inference.
Direct ChromaDB Integration: We've added a ChromaDB client for direct interaction with a ChromaDB server, supporting embeddings models served via Ollama or sentence-transformers. No extra dependencies needed!
Simplified & Streamlined: We've reduced external Python package dependencies and simplified models and clients for a cleaner, more efficient codebase.
Python 3.8 Compatibility Fixes: We've addressed backward compatibility issues to ensure smoother installations for Python 3.8 users.
Dynamic System Prompts: The LLM/MLLM components now support set_system_prompt, allowing for dynamic system prompt configuration without modifying model settings.

🛠️ Under the Hood Improvements

Beyond the big features, we've also implemented a host of improvements and fixes:

Improved connection error messages and updated installation instructions for a smoother onboarding experience.
Added a Debian packaging workflow for easier deployment.
Introduced Ollama-specific inference options for better control over your Ollama models.
Integrated the MeloTTS model and enhanced the Text-to-Speech component with a "say text" method for event-driven invocation.
Implemented streaming playback for input in the Speech-to-Text component.
Refactored components to remove sounddevice as a dependency for Text-to-Speech, simplifying setup.
Added warnings for local models when GPU is set but runtime isn't available.
Enhanced the Speech-to-Text component with a hypothesis buffer for confirmed transcripts and asynchronous receiving for streaming websockets.
Improved asynchronous publishing of responses in the LLM component when streaming.
Ensured smoother termination by marking child threads as daemons.
Added a break_character to the LLM component config for chunking streaming output.
Introduced support for RGBD messages (in Realsense style).

For a comprehensive list of all changes, please refer to the [full changelog](0.3.3...0.4.0).

📖 Dive Deeper

Ready to explore these new features? Check out our updated documentation for detailed information, examples, and installation guide.

Assets 8

28 Jan 16:01

aleph-ra

0.3.3

134975c

0.3.3

What's Changed

Feature/tracking by @mkabtoul in #15
Feature/wake word by @aleph-ra in #16

Full Changelog: 0.3.1...0.3.3

Contributors

aleph-ra and mkabtoul

Assets 5

29 Oct 15:57

aleph-ra

0.3.1

97a6d0c

0.3.1

Synopsis

This release adds significant updates to allow components to be launched as separate processes. This makes the system more robust to component level failures. To learn more about how to make your embodied agent graph robust and fault tolerant, check out this example.

If you are following the install from source procedure provided in the README, be sure to use the latest release of ROS Sugar.

NB: Release tags will be added in a new format starting from this release. The preceding 'v' has been removed.

What's Changed

This patch adds support for using tool calling in LLM components when doing multiprocess execution. Tool calling is supported with OllamaClient. To learn more about it, check out this example.

Assets 2

28 Oct 12:44

aleph-ra

0.3.0

e8b7700

0.3.0

Synopsis

If you are following the install from source procedure provided in the README, be sure to use the latest release of ROS Sugar.

NB: Release tags will be added in a new format starting from this release. The preceding 'v' has been removed.

What's Changed

Feature/external processors by @aleph-ra in #14

Contributors

aleph-ra

Assets 2

Releases: automatika-robotics/embodied-agents

0.4.3

🚀 EmbodiedAgents v0.4.3

✨ New Features

Dynamically generated UI for Agents

Components & Message Handling

LLM / MLLM Enhancements

🧰 Fixes & Improvements

⚙️ Chores & Maintenance

📚 Documentation Updates

🧩 Summary

📖 Dive Deeper

Uh oh!

0.4.1

🚀 EmbodiedAgents 0.4.1 Release: Faster, More Flexible, and Ready for Anything!

✨ What's New?

🛠️ Under the Hood Improvements

📖 Dive Deeper

Uh oh!

0.3.3

What's Changed

Contributors

Uh oh!

0.3.1

Synopsis

What's Changed

Uh oh!

0.3.0

Synopsis

What's Changed

Contributors

Uh oh!