Merge pull request #658 from wei-m-teh/marengo-3.0-update

lanazhang · web-flow · commit 173c415231a7 · 2025-11-17T19:13:06.000-05:00
updated video embedding sample notebook with Marengo 3.0
diff --git a/multi-modal/TwelveLabs/bedrock-twelvelabs-embedding-opensearchserverless.ipynb b/multi-modal/TwelveLabs/bedrock-twelvelabs-embedding-opensearchserverless.ipynb
@@ -13,7 +13,7 @@
    "id": "6bea4d77",
    "metadata": {},
    "source": [
-    "TwelveLabs is a leading provider of multimodal AI models specializing in video understanding and analysis. TwelveLabs' advanced models enable sophisticated video search, analysis, and content generation capabilities through state-of-the-art computer vision and natural language processing technologies. [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) now offers two TwelveLabs models: [TwelveLabs Pegasus 1.2](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-pegasus.html), which provides comprehensive video understanding and analysis, and [TwelveLabs Marengo Embed 2.7](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo.html), which generates high-quality embeddings for video, text, audio, and image content. These models empower developers to build applications that can intelligently process, analyze, and derive insights from video data at scale.\n",
+    "TwelveLabs is a leading provider of multimodal AI models specializing in video understanding and analysis. TwelveLabs' advanced models enable sophisticated video search, analysis, and content generation capabilities through state-of-the-art computer vision and natural language processing technologies. [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) now offers two TwelveLabs models: [TwelveLabs Pegasus 1.2](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-pegasus.html), which provides comprehensive video understanding and analysis, and [TwelveLabs Marengo Embed 3.0](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo-3.html), which generates high-quality embeddings for video, text, audio, and image content. These models empower developers to build applications that can intelligently process, analyze, and derive insights from video data at scale.\n",
     "\n",
     "In this notebook, we'll be using TwelveLabs Marengo model for generating embeddings for content in texts, images and videos to enable multimodal search and analysis capabilities across different media types. "
    ]
@@ -106,7 +106,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%uv pip install -r requirements.txt"
+    "!uv pip install -r requirements.txt"
    ]
   },
   {
@@ -317,8 +317,8 @@
     "bedrock_client = boto3.client(\"bedrock-runtime\")\n",
     "s3_client = boto3.client(\"s3\")\n",
     "aws_account_id = boto3.client('sts').get_caller_identity()[\"Account\"]\n",
-    "model_id = \"twelvelabs.marengo-embed-2-7-v1:0\"\n",
-    "cris_model_id = \"us.twelvelabs.marengo-embed-2-7-v1:0\"\n",
+    "model_id = \"twelvelabs.marengo-embed-3-0-v1:0\"\n",
+    "cris_model_id = \"us.twelvelabs.marengo-embed-3-0-v1:0\"\n",
     "s3_bucket_name = '<an S3 bucket for storing the outputs>'\n",
     "\n",
     "bedrock_twelvelabs_helper = BedrockTwelvelabsHelper(bedrock_client=bedrock_client, \n",
@@ -359,7 +359,7 @@
     "## Download a Sample Video and Upload to S3 as Input\n",
     "We'll use the TwelveLabs Marengo model to generate embeddings from this video and perform content-based search.\n",
     "\n",
-    "![Meridian](./assets/images/sample-video-meridian.png)\n",
+    "![Meridian](./images/sample-video-meridian.png)\n",
     "We will use an open-source sample video, [Meridian](https://en.wikipedia.org/wiki/Meridian_(film)), as input to generate embeddings."
    ]
   },
@@ -421,20 +421,20 @@
    "id": "6e9914e4",
    "metadata": {},
    "source": [
-    "#### Marengo Embed 2.7 on Bedrock\n",
+    "#### Marengo Embed 3.0 on Bedrock\n",
     "\n",
     "A multimodal embedding model that generates high-quality vector representations of video, text, audio, and image content for similarity search, clustering, and other machine learning tasks. The model supports multiple input modalities and provides specialized embeddings optimized for different use cases.\n",
     "\n",
     "The model supports asynchronous inference through the [StartAsyncInvoke API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html).\n",
     "- Provider — TwelveLabs\n",
     "- Categories — Embeddings, multimodal\n",
-    "- Model ID — `twelvelabs.marengo-embed-2-7-v1:0`\n",
+    "- Model ID — `us.twelvelabs.marengo-embed-3-0-v1:0`\n",
     "- Input modality — Video, Text, Audio, Image\n",
     "- Output modality — Embeddings\n",
     "- Max video size — 2 hours long video (< 2GB file size)\n",
     "\n",
     "**Resources:**\n",
-    "- [AWS Docs: TwelveLabs Marengo Embed 2.7](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo.html)\n",
+    "- [AWS Docs: TwelveLabs Marengo Embed 3.0](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo-3.html)\n",
     "- [TwelveLabs Docs: Marengo](https://docs.twelvelabs.io/v1.3/docs/concepts/models/marengo)\n"
    ]
   },
@@ -491,14 +491,22 @@
     "print(f\"✅ Video embedding created successfully with {len(video_embedding_data)} segment(s)\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "beeb5d2b",
+   "metadata": {},
+   "source": [
+    "Prints the video embedding for reference"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "7864241b",
    "metadata": {},
    "outputs": [],
    "source": [
-    "[x for x in video_embedding_data if x[\"embeddingOption\"] == \"visual-image\"][0]"
+    "[x for x in video_embedding_data if x[\"embeddingOption\"] == \"visual\"][0]"
    ]
   },
   {
@@ -583,7 +591,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "text_query = \"a person smoking in a room\"\n",
+    "text_query = \"A person smoking in a room\"\n",
     "text_search_results = bedrock_twelvelabs_helper.search_videos_by_text(text_query, top_k=3)\n"
    ]
   },
@@ -776,7 +784,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "aws3",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
diff --git a/multi-modal/TwelveLabs/utils.py b/multi-modal/TwelveLabs/utils.py
@@ -375,7 +375,9 @@ def create_text_embedding_async(self, text_query: str) -> list:
             modelId=self.model_id,
             modelInput={
                 "inputType": "text",
-                "inputText": text_query
+                "text": {
+                    "inputText": text_query
+                }
             },
             outputDataConfig={
                 "s3OutputDataConfig": {
@@ -409,7 +411,9 @@ def create_text_embedding(self, text_query: str) -> list:
         
         modelInput={
                 "inputType": "text",
-                "inputText": text_query
+                "text": {
+                    "inputText": text_query
+                }
         }
         response = self.bedrock_client.invoke_model(
             modelId=self.cris_model_id,
@@ -441,10 +445,12 @@ def create_image_embedding(self, image_path: str) -> list:
         s3_image_uri = f's3://{self.s3_bucket_name}/{self.s3_images_path}/{image_path_basename}'
         modelInput={
                 "inputType": "image",
-                "mediaSource": {
-                    "s3Location": {
-                        "uri": s3_image_uri,
-                        "bucketOwner": self.aws_account_id
+                "image" : {
+                    "mediaSource": {
+                        "s3Location": {
+                            "uri": s3_image_uri,
+                            "bucketOwner": self.aws_account_id
+                        }
                     }
                 }
             }
@@ -526,10 +532,12 @@ def create_video_embedding(self, video_s3_uri: str) -> list:
             modelId=self.model_id,
             modelInput={
                 "inputType": "video",
-                "mediaSource": {
-                    "s3Location": {
-                        "uri": video_s3_uri,
-                        "bucketOwner": self.aws_account_id
+                "video" : {
+                    "mediaSource": {
+                        "s3Location": {
+                            "uri": video_s3_uri,
+                            "bucketOwner": self.aws_account_id
+                        }
                     }
                 }
             },
@@ -577,7 +585,7 @@ def create_opensearch_index(self, index_name_prefix: str):
                 "properties": {
                     "embedding": {
                         "type": "knn_vector",
-                        "dimension": 1024,
+                        "dimension": 512,
                         "method": {
                             "engine": "faiss",
                             "name": "hnsw",
@@ -622,7 +630,7 @@ def index_video_embeddings(self, video_embeddings: list, video_id: str = "sample
                 "end_time": segment["endSec"],
                 "video_id": video_id,
                 "segment_id": i,
-                "embedding_option": segment.get("embeddingOption", "visual-text")
+                "embedding_option": segment.get("embeddingOption", "visual")
             }
             documents.append(document)
         
@@ -662,41 +670,62 @@ def search_videos_by_text(self, query_text: str, top_k: int=5) -> list:
         print(f"Generating embedding for query: '{query_text}'")
         query_embedding_data = self.create_text_embedding(query_text)
         query_embedding = query_embedding_data[0]["embedding"]
-        
-        # Search OpenSearch index
+
         search_body = {
-            "query": {
-                "knn": {
-                    "embedding": {
-                        "vector": query_embedding,
-                        "k": top_k
+                        "query": {
+                            "script_score": {
+                            "query": {
+                                "bool": {
+                                    "filter": {
+                                        "bool": {
+                                            "must": [
+                                                {
+                                                    "term": {
+                                                        "embedding_option": "visual"
+                                                    }
+                                                }
+                                            ]
+                                        }
+                                    }
+                                }
+                            },
+                            "script": {
+                                    "source": "knn_score",
+                                    "lang": "knn",
+                                    "params": {
+                                    "field": "embedding",
+                                    "query_value": query_embedding,
+                                    "space_type": "l2"
+                                }
+                            }
+                        }
+                    },
+                        "_source": ["start_time", "end_time", "video_id", "segment_id", "embedding_option"],
+                        "size": top_k,
                     }
-                }
-            },
-            "size": top_k,
-            "_source": ["start_time", "end_time", "video_id", "segment_id"]
-        }
-        
+
         response = self.opensearch_client.search(index=self.index_name, body=search_body)
-        
+
         print(f"\n✅ Found {len(response['hits']['hits'])} matching segments:")
         results = []
-        
+
         for hit in response['hits']['hits']:
+            print(hit)
             result = {
                 "score": hit["_score"],
                 "video_id": hit["_source"]["video_id"],
                 "segment_id": hit["_source"]["segment_id"],
                 "start_time": hit["_source"]["start_time"],
-                "end_time": hit["_source"]["end_time"]
+                "end_time": hit["_source"]["end_time"],
+                "embedding_option": hit["_source"]["embedding_option"]
             }
             results.append(result)
             
             print(f" Score: {result['score']:.4f} | Video: {self.video_embedding_mapping[result['video_id']]} | "
-                f"Segment: {result['segment_id']} | Time: {result['start_time']:.1f}s - {result['end_time']:.1f}s")
-        
+                f"Segment: {result['segment_id']} | Embedding Option: {result["embedding_option"]} | Time: {result['start_time']:.1f}s - {result['end_time']:.1f}s")
         return results
 
+
     # Image Query Search Function
     def search_videos_by_image(self, image_path: str, top_k: int=5) -> list:
         """