Skip to content

Conversation

@b3nw
Copy link
Contributor

@b3nw b3nw commented Jan 31, 2026

The embedding endpoint existed but was broken because the executor refactoring (Nov 2025) forgot to add litellm.aembedding() dispatch. All requests were going through litellm.acompletion(), causing failures.

Changes:

  • Add request_type field to RequestContext (completion/embedding)
  • Update RotatingClient.aembedding() to set request_type='embedding'
  • Add conditional dispatch in RequestExecutor._execute_non_streaming() to call litellm.aembedding() for embedding requests
  • Refactored dispatch logic to reduce code duplication

Tested with Mistral (mistral-embed) - returns proper 1024-dim vectors.


Important

Fix embedding dispatch in RequestExecutor by adding request_type to RequestContext and updating logic to correctly handle embedding requests.

  • Behavior:
    • Fix embedding dispatch in RequestExecutor._execute_non_streaming() to call litellm.aembedding() for embedding requests.
    • Add request_type field to RequestContext in types.py to distinguish between completion and embedding requests.
    • Update RotatingClient.aembedding() to set request_type='embedding'.
  • Testing:
    • Tested with Mistral (mistral-embed) to ensure correct 1024-dim vector results.

This description was created by Ellipsis for 2c755e8. You can customize this summary. It will automatically update as commits are pushed.

@b3nw b3nw requested a review from Mirrowel as a code owner January 31, 2026 05:29
@b3nw
Copy link
Contributor Author

b3nw commented Jan 31, 2026

@mirrobot-agent your PR review workflow failed, re-trigger it.

@mirrobot-agent
Copy link
Contributor

@b3nw, I've received your request to re-trigger the review. I'm starting my analysis of the embedding dispatch fixes now and will post the results shortly.

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This PR correctly identifies and fixes the issue where embedding requests were being dispatched to litellm.acompletion instead of litellm.aembedding. The introduction of request_type in RequestContext is a solid approach and the refactor in RequestExecutor to dynamically select the call function is clean.

Architectural Feedback

The addition of request_type to RequestContext is a good way to handle different API operation types within the same executor logic. This pattern could potentially be extended to other operation types in the future.

Key Suggestions

  • Enhance aembedding feature parity: Currently, aembedding in RotatingClient is missing model resolution and transaction logging.
  • Async Consistency: Consider making aembedding an async def to match acompletion.

Warnings

None.

This review was generated by an AI assistant.

provider=provider,
kwargs=kwargs,
streaming=False,
request_type="embedding",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While adding request_type="embedding" here is correct, the aembedding method is still missing feature parity with acompletion. Specifically, it lacks model resolution and transaction logging.

Also, for consistency with the rest of the client, this should ideally be an async def and await the executor.

Suggested change
request_type="embedding",
# Resolve model ID
resolved_model = self._model_resolver.resolve_model_id(model, provider)
kwargs["model"] = resolved_model
# Create transaction logger if enabled
transaction_logger = None
if self.enable_request_logging:
transaction_logger = TransactionLogger(
provider=provider,
model=resolved_model,
enabled=True,
parent_dir=kwargs.pop("_parent_log_dir", None),
)
transaction_logger.log_request(kwargs)
# Build request context (embeddings are never streaming)
context = RequestContext(
model=resolved_model,
provider=provider,
kwargs=kwargs,
streaming=False,
request_type="embedding",
credentials=self.all_credentials.get(provider, []),
deadline=time.time() + self.global_timeout,
request=request,
pre_request_callback=pre_request_callback,
transaction_logger=transaction_logger,
)
return await self._executor.execute(context)

Note: This would also require changing the function signature to async def aembedding(...).

@mirrobot-agent
Copy link
Contributor

@b3nw, I've completed the re-triggered review for PR #112.

Summary

The fixes for embedding dispatch are technically sound and resolve the core issue where embeddings were incorrectly using the completion endpoint. I've suggested some further enhancements to bring the aembedding method to full feature parity with acompletion.

Key Findings

  • Correct Dispatch: The use of request_type to distinguish between call types is a good architectural choice.
  • Improved Executor Logic: The dynamic selection of aembedding vs acompletion in the executor reduces code duplication.
  • Enhancement Opportunity: aembedding is currently missing model resolution and transaction logging, which I've highlighted in the review comments.

You can find the detailed feedback and code suggestions in the formal review.

This update was generated by an AI assistant.

The embedding endpoint existed but was broken because the executor
refactoring (Nov 2025) forgot to add litellm.aembedding() dispatch.
All requests were going through litellm.acompletion(), causing failures.

Changes:
- Add request_type field to RequestContext (completion/embedding)
- Update RotatingClient.aembedding() to set request_type='embedding'
- Add conditional dispatch in RequestExecutor._execute_non_streaming()
  to call litellm.aembedding() for embedding requests
- Refactored dispatch logic to reduce code duplication
- Added feature parity to aembedding: async signature, model resolution,
  and transaction logging (matching acompletion)

Tested with Mistral (mistral-embed) - returns proper 1024-dim vectors.
@b3nw b3nw force-pushed the feature/embedding-support branch from 2c755e8 to 68babe7 Compare January 31, 2026 06:02
@b3nw
Copy link
Contributor Author

b3nw commented Jan 31, 2026

@mirrobot-agent addressed your items, re-review

@mirrobot-agent
Copy link
Contributor

@b3nw, I've received your request for a re-review. I'll check the latest changes to ensure the embedding dispatch logic is fully aligned with the suggestions. I'll post my findings shortly! 🔍

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

The updates successfully address all the points raised in the previous review. The aembedding method now has full feature parity with acompletion, and the dispatch logic in the RequestExecutor is clean and robust.

Key Changes Verified

  • Async Parity: aembedding is now an async def and properly awaits the executor.
  • Model Resolution: Added _model_resolver.resolve_model_id to ensure embedding requests use resolved model IDs.
  • Transaction Logging: Integrated TransactionLogger into the embedding flow, ensuring requests and responses are logged when enabled.
  • Clean Dispatch: The refactored logic in RequestExecutor._execute_non_streaming correctly selects between litellm.aembedding and litellm.acompletion based on the request_type.

This is a great improvement to the library's embedding support.

This review was generated by an AI assistant.

Comment on lines 361 to 410
@@ -375,19 +375,39 @@ def aembedding(
f"Invalid model format or no credentials for provider: {model}"
)

# Extract internal logging parameters (not passed to API)
parent_log_dir = kwargs.pop("_parent_log_dir", None)

# Resolve model ID
resolved_model = self._model_resolver.resolve_model_id(model, provider)
kwargs["model"] = resolved_model

# Create transaction logger if enabled
transaction_logger = None
if self.enable_request_logging:
transaction_logger = TransactionLogger(
provider=provider,
model=resolved_model,
enabled=True,
parent_dir=parent_log_dir,
)
transaction_logger.log_request(kwargs)

# Build request context (embeddings are never streaming)
context = RequestContext(
model=model,
model=resolved_model,
provider=provider,
kwargs=kwargs,
streaming=False,
request_type="embedding",
credentials=self.all_credentials.get(provider, []),
deadline=time.time() + self.global_timeout,
request=request,
pre_request_callback=pre_request_callback,
transaction_logger=transaction_logger,
)

return self._executor.execute(context)
return await self._executor.execute(context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix verified: aembedding now has full feature parity with acompletion, including async support, model resolution, and transaction logging. This resolves my previous feedback.

@mirrobot-agent
Copy link
Contributor

@b3nw, I've completed the re-review for PR #112.

Summary

The updates successfully address all the items identified in the previous review. The embedding support is now fully integrated with the library's core features (model resolution, transaction logging, and async execution).

Key Improvements

  • Full Feature Parity: aembedding now supports model resolution and transaction logging, matching the capabilities of the completion endpoint.
  • Improved Architecture: The introduction of request_type in RequestContext allows for clean and maintainable dispatch logic in the RequestExecutor.
  • Async Consistency: The aembedding method is now correctly implemented as an async def.

I've approved the PR. Great work on these fixes!

This update was generated by an AI assistant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant