Skip to content

Conversation

@danny-avila
Copy link
Owner

Summary

I improved temporary file cleanup reliability in the document loader and enhanced logging clarity for vector store operations.

  • Fixed cleanup_temp_encoding_file to verify _temp_filepath is not None before attempting file removal
  • Added _sanitize_parameters_for_logging static method to ExtendedPgVector for truncating embeddings and large values in logs
  • Updated query logging to utilize sanitization, preventing verbose embedding vectors from cluttering log output
  • Maintained proper handling of various parameter types including dictionaries, lists, tuples, and nested structures

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Testing

Test the changes by:

  1. Loading documents with various encodings to verify temporary file cleanup works correctly
  2. Enabling DEBUG_PGVECTOR_QUERIES environment variable and performing vector store operations
  3. Checking logs to confirm embedding vectors are properly truncated while preserving useful debugging information

Test Configuration:

  • Set DEBUG_PGVECTOR_QUERIES=true for vector store logging tests
  • Test with CSV files containing non-UTF-8 encodings
  • Verify with documents that trigger temporary file creation

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • My changes do not introduce new warnings

… avoid lengthy embedding logs

- Implemented a static method `_sanitize_parameters_for_logging` to truncate large values and embeddings for improved logging clarity.
- Updated the `setup_query_logging` method to utilize the new sanitization method, ensuring sensitive or large data is not logged directly.
- Updated the `cleanup_temp_encoding_file` function to check that `_temp_filepath` is not None before attempting to remove the file, preventing potential errors when the attribute is present but not initialized.
@danny-avila danny-avila merged commit 109a2d3 into main Aug 27, 2025
1 check passed
HenryNVP pushed a commit to HenryNVP/rag_api that referenced this pull request Oct 24, 2025
…a#203)

* ✨ feat: Add parameter sanitization for logging in ExtendedPgVector to avoid lengthy embedding logs

- Implemented a static method `_sanitize_parameters_for_logging` to truncate large values and embeddings for improved logging clarity.
- Updated the `setup_query_logging` method to utilize the new sanitization method, ensuring sensitive or large data is not logged directly.

* 🔧 fix: Ensure temporary file cleanup only occurs if the filepath is set

- Updated the `cleanup_temp_encoding_file` function to check that `_temp_filepath` is not None before attempting to remove the file, preventing potential errors when the attribute is present but not initialized.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants