Skip to content

Document bulk ingestion and write parallelism#193

Draft
wjones127 wants to merge 2 commits intomainfrom
docs/bulk-ingestion
Draft

Document bulk ingestion and write parallelism#193
wjones127 wants to merge 2 commits intomainfrom
docs/bulk-ingestion

Conversation

@wjones127
Copy link
Copy Markdown

Summary

  • Rewrites "Use Iterators / Write Large Datasets" → "Loading Large Datasets" with subsections for file-based ingestion (pyarrow.dataset), iterator-based ingestion, and write parallelism behavior
  • Updates FAQ "How can I speed up data inserts?" with specific guidance on auto-parallelism and the create-empty-then-add pattern
  • Adds test for pyarrow.datasettable.add() pattern

Related: bulk-ingestion-7e70a4dab825, lancedb/lancedb#3173

Test plan

  • pytest tests/py/test_tables.py — all 48 tests pass
  • Snippets regenerated via python scripts/mdx_snippets_gen.py
  • Preview with npx mintlify dev

🤖 Generated with Claude Code

@mintlify
Copy link
Copy Markdown
Contributor

mintlify bot commented Mar 20, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
lancedb-bcbb4faf 🟢 Ready View Preview Mar 20, 2026, 7:31 PM

Copy link
Copy Markdown
Contributor

@prrao87 prrao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires lancedb==0.30.0 to work, right? Could you bump the versions for deps in pyproject.toml as needed? Thanks!

@prrao87
Copy link
Copy Markdown
Contributor

prrao87 commented Mar 25, 2026

Hi @wjones127 , is this ready for review? Or has it not yet been officially released in the stable version?

@wjones127
Copy link
Copy Markdown
Author

Hi @wjones127 , is this ready for review? Or has it not yet been officially released in the stable version?

Hi, this is not ready for review. In general, I leave my PRs in draft until they are ready for review.

wjones127 and others added 2 commits April 2, 2026 15:34
`table.add()` now auto-parallelizes large writes, but the docs still showed
only the old iterator-based pattern. This rewrites the "Use Iterators" section
into "Loading Large Datasets" with guidance on `pyarrow.dataset` input, the
create-empty-then-add pattern, and auto-parallelism behavior. Updates the FAQ
to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants