Skip to content

Commit 173c415

Browse files
authored
Merge pull request #658 from wei-m-teh/marengo-3.0-update
updated video embedding sample notebook with Marengo 3.0
2 parents 20abbae + 013cbd3 commit 173c415

File tree

2 files changed

+78
-41
lines changed

2 files changed

+78
-41
lines changed

multi-modal/TwelveLabs/bedrock-twelvelabs-embedding-opensearchserverless.ipynb

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"id": "6bea4d77",
1414
"metadata": {},
1515
"source": [
16-
"TwelveLabs is a leading provider of multimodal AI models specializing in video understanding and analysis. TwelveLabs' advanced models enable sophisticated video search, analysis, and content generation capabilities through state-of-the-art computer vision and natural language processing technologies. [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) now offers two TwelveLabs models: [TwelveLabs Pegasus 1.2](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-pegasus.html), which provides comprehensive video understanding and analysis, and [TwelveLabs Marengo Embed 2.7](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo.html), which generates high-quality embeddings for video, text, audio, and image content. These models empower developers to build applications that can intelligently process, analyze, and derive insights from video data at scale.\n",
16+
"TwelveLabs is a leading provider of multimodal AI models specializing in video understanding and analysis. TwelveLabs' advanced models enable sophisticated video search, analysis, and content generation capabilities through state-of-the-art computer vision and natural language processing technologies. [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) now offers two TwelveLabs models: [TwelveLabs Pegasus 1.2](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-pegasus.html), which provides comprehensive video understanding and analysis, and [TwelveLabs Marengo Embed 3.0](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo-3.html), which generates high-quality embeddings for video, text, audio, and image content. These models empower developers to build applications that can intelligently process, analyze, and derive insights from video data at scale.\n",
1717
"\n",
1818
"In this notebook, we'll be using TwelveLabs Marengo model for generating embeddings for content in texts, images and videos to enable multimodal search and analysis capabilities across different media types. "
1919
]
@@ -106,7 +106,7 @@
106106
"metadata": {},
107107
"outputs": [],
108108
"source": [
109-
"%uv pip install -r requirements.txt"
109+
"!uv pip install -r requirements.txt"
110110
]
111111
},
112112
{
@@ -317,8 +317,8 @@
317317
"bedrock_client = boto3.client(\"bedrock-runtime\")\n",
318318
"s3_client = boto3.client(\"s3\")\n",
319319
"aws_account_id = boto3.client('sts').get_caller_identity()[\"Account\"]\n",
320-
"model_id = \"twelvelabs.marengo-embed-2-7-v1:0\"\n",
321-
"cris_model_id = \"us.twelvelabs.marengo-embed-2-7-v1:0\"\n",
320+
"model_id = \"twelvelabs.marengo-embed-3-0-v1:0\"\n",
321+
"cris_model_id = \"us.twelvelabs.marengo-embed-3-0-v1:0\"\n",
322322
"s3_bucket_name = '<an S3 bucket for storing the outputs>'\n",
323323
"\n",
324324
"bedrock_twelvelabs_helper = BedrockTwelvelabsHelper(bedrock_client=bedrock_client, \n",
@@ -359,7 +359,7 @@
359359
"## Download a Sample Video and Upload to S3 as Input\n",
360360
"We'll use the TwelveLabs Marengo model to generate embeddings from this video and perform content-based search.\n",
361361
"\n",
362-
"![Meridian](./assets/images/sample-video-meridian.png)\n",
362+
"![Meridian](./images/sample-video-meridian.png)\n",
363363
"We will use an open-source sample video, [Meridian](https://en.wikipedia.org/wiki/Meridian_(film)), as input to generate embeddings."
364364
]
365365
},
@@ -421,20 +421,20 @@
421421
"id": "6e9914e4",
422422
"metadata": {},
423423
"source": [
424-
"#### Marengo Embed 2.7 on Bedrock\n",
424+
"#### Marengo Embed 3.0 on Bedrock\n",
425425
"\n",
426426
"A multimodal embedding model that generates high-quality vector representations of video, text, audio, and image content for similarity search, clustering, and other machine learning tasks. The model supports multiple input modalities and provides specialized embeddings optimized for different use cases.\n",
427427
"\n",
428428
"The model supports asynchronous inference through the [StartAsyncInvoke API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html).\n",
429429
"- Provider — TwelveLabs\n",
430430
"- Categories — Embeddings, multimodal\n",
431-
"- Model ID — `twelvelabs.marengo-embed-2-7-v1:0`\n",
431+
"- Model ID — `us.twelvelabs.marengo-embed-3-0-v1:0`\n",
432432
"- Input modality — Video, Text, Audio, Image\n",
433433
"- Output modality — Embeddings\n",
434434
"- Max video size — 2 hours long video (< 2GB file size)\n",
435435
"\n",
436436
"**Resources:**\n",
437-
"- [AWS Docs: TwelveLabs Marengo Embed 2.7](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo.html)\n",
437+
"- [AWS Docs: TwelveLabs Marengo Embed 3.0](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo-3.html)\n",
438438
"- [TwelveLabs Docs: Marengo](https://docs.twelvelabs.io/v1.3/docs/concepts/models/marengo)\n"
439439
]
440440
},
@@ -491,14 +491,22 @@
491491
"print(f\"✅ Video embedding created successfully with {len(video_embedding_data)} segment(s)\")"
492492
]
493493
},
494+
{
495+
"cell_type": "markdown",
496+
"id": "beeb5d2b",
497+
"metadata": {},
498+
"source": [
499+
"Prints the video embedding for reference"
500+
]
501+
},
494502
{
495503
"cell_type": "code",
496504
"execution_count": null,
497505
"id": "7864241b",
498506
"metadata": {},
499507
"outputs": [],
500508
"source": [
501-
"[x for x in video_embedding_data if x[\"embeddingOption\"] == \"visual-image\"][0]"
509+
"[x for x in video_embedding_data if x[\"embeddingOption\"] == \"visual\"][0]"
502510
]
503511
},
504512
{
@@ -583,7 +591,7 @@
583591
"metadata": {},
584592
"outputs": [],
585593
"source": [
586-
"text_query = \"a person smoking in a room\"\n",
594+
"text_query = \"A person smoking in a room\"\n",
587595
"text_search_results = bedrock_twelvelabs_helper.search_videos_by_text(text_query, top_k=3)\n"
588596
]
589597
},
@@ -776,7 +784,7 @@
776784
],
777785
"metadata": {
778786
"kernelspec": {
779-
"display_name": "aws3",
787+
"display_name": ".venv",
780788
"language": "python",
781789
"name": "python3"
782790
},

multi-modal/TwelveLabs/utils.py

Lines changed: 59 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,9 @@ def create_text_embedding_async(self, text_query: str) -> list:
375375
modelId=self.model_id,
376376
modelInput={
377377
"inputType": "text",
378-
"inputText": text_query
378+
"text": {
379+
"inputText": text_query
380+
}
379381
},
380382
outputDataConfig={
381383
"s3OutputDataConfig": {
@@ -409,7 +411,9 @@ def create_text_embedding(self, text_query: str) -> list:
409411

410412
modelInput={
411413
"inputType": "text",
412-
"inputText": text_query
414+
"text": {
415+
"inputText": text_query
416+
}
413417
}
414418
response = self.bedrock_client.invoke_model(
415419
modelId=self.cris_model_id,
@@ -441,10 +445,12 @@ def create_image_embedding(self, image_path: str) -> list:
441445
s3_image_uri = f's3://{self.s3_bucket_name}/{self.s3_images_path}/{image_path_basename}'
442446
modelInput={
443447
"inputType": "image",
444-
"mediaSource": {
445-
"s3Location": {
446-
"uri": s3_image_uri,
447-
"bucketOwner": self.aws_account_id
448+
"image" : {
449+
"mediaSource": {
450+
"s3Location": {
451+
"uri": s3_image_uri,
452+
"bucketOwner": self.aws_account_id
453+
}
448454
}
449455
}
450456
}
@@ -526,10 +532,12 @@ def create_video_embedding(self, video_s3_uri: str) -> list:
526532
modelId=self.model_id,
527533
modelInput={
528534
"inputType": "video",
529-
"mediaSource": {
530-
"s3Location": {
531-
"uri": video_s3_uri,
532-
"bucketOwner": self.aws_account_id
535+
"video" : {
536+
"mediaSource": {
537+
"s3Location": {
538+
"uri": video_s3_uri,
539+
"bucketOwner": self.aws_account_id
540+
}
533541
}
534542
}
535543
},
@@ -577,7 +585,7 @@ def create_opensearch_index(self, index_name_prefix: str):
577585
"properties": {
578586
"embedding": {
579587
"type": "knn_vector",
580-
"dimension": 1024,
588+
"dimension": 512,
581589
"method": {
582590
"engine": "faiss",
583591
"name": "hnsw",
@@ -622,7 +630,7 @@ def index_video_embeddings(self, video_embeddings: list, video_id: str = "sample
622630
"end_time": segment["endSec"],
623631
"video_id": video_id,
624632
"segment_id": i,
625-
"embedding_option": segment.get("embeddingOption", "visual-text")
633+
"embedding_option": segment.get("embeddingOption", "visual")
626634
}
627635
documents.append(document)
628636

@@ -662,41 +670,62 @@ def search_videos_by_text(self, query_text: str, top_k: int=5) -> list:
662670
print(f"Generating embedding for query: '{query_text}'")
663671
query_embedding_data = self.create_text_embedding(query_text)
664672
query_embedding = query_embedding_data[0]["embedding"]
665-
666-
# Search OpenSearch index
673+
667674
search_body = {
668-
"query": {
669-
"knn": {
670-
"embedding": {
671-
"vector": query_embedding,
672-
"k": top_k
675+
"query": {
676+
"script_score": {
677+
"query": {
678+
"bool": {
679+
"filter": {
680+
"bool": {
681+
"must": [
682+
{
683+
"term": {
684+
"embedding_option": "visual"
685+
}
686+
}
687+
]
688+
}
689+
}
690+
}
691+
},
692+
"script": {
693+
"source": "knn_score",
694+
"lang": "knn",
695+
"params": {
696+
"field": "embedding",
697+
"query_value": query_embedding,
698+
"space_type": "l2"
699+
}
700+
}
701+
}
702+
},
703+
"_source": ["start_time", "end_time", "video_id", "segment_id", "embedding_option"],
704+
"size": top_k,
673705
}
674-
}
675-
},
676-
"size": top_k,
677-
"_source": ["start_time", "end_time", "video_id", "segment_id"]
678-
}
679-
706+
680707
response = self.opensearch_client.search(index=self.index_name, body=search_body)
681-
708+
682709
print(f"\n✅ Found {len(response['hits']['hits'])} matching segments:")
683710
results = []
684-
711+
685712
for hit in response['hits']['hits']:
713+
print(hit)
686714
result = {
687715
"score": hit["_score"],
688716
"video_id": hit["_source"]["video_id"],
689717
"segment_id": hit["_source"]["segment_id"],
690718
"start_time": hit["_source"]["start_time"],
691-
"end_time": hit["_source"]["end_time"]
719+
"end_time": hit["_source"]["end_time"],
720+
"embedding_option": hit["_source"]["embedding_option"]
692721
}
693722
results.append(result)
694723

695724
print(f" Score: {result['score']:.4f} | Video: {self.video_embedding_mapping[result['video_id']]} | "
696-
f"Segment: {result['segment_id']} | Time: {result['start_time']:.1f}s - {result['end_time']:.1f}s")
697-
725+
f"Segment: {result['segment_id']} | Embedding Option: {result["embedding_option"]} | Time: {result['start_time']:.1f}s - {result['end_time']:.1f}s")
698726
return results
699727

728+
700729
# Image Query Search Function
701730
def search_videos_by_image(self, image_path: str, top_k: int=5) -> list:
702731
"""

0 commit comments

Comments
 (0)