Conversation
a44b435 to
e0aa813
Compare
|
It seems like the domain detection could break if the download stream pauses mid-query, e.g. So we'll need some way of resuming from there – even if that way is oversimplified for now and requires going back to the initial It would be nice to auto-detect the source site domain and mark it for rewriting, maybe assets domain as well if it's separate. Other than that, I'll need to take this for a spin. |
|
Also – it would be cool to support downloading the media files loaded from external domains, but we don't strictly need it to land this feature. |
Remove unused CONVERT_PREFIX and CONVERT_PREFIX_LEN constants from Base64ValueScanner. Exclude vendored mysql-query-stream/ from PHPStan analysis while keeping it in scan paths for class discovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update port from 8102 to 8104 (matching site-registry.json) - Fix URL construction: use & instead of ? for query string continuation (getSiteUrl already includes ?) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The url-rewriting feature depends on wp-php-toolkit/data-liberation classes (URLInTextProcessor, WPURL) loaded via Composer autoloader. Add composer install to setup.sh (runs in both CI and Docker) and install Composer in the Dockerfile. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Domain discovery now persists to .import-domains.json alongside the periodic cursor saves during db-sync. Previously domains were only written after the full download completed, which meant a crash would lose all discovered domains since the resumed download skips already- downloaded SQL data. The source site domain is also auto-detected from the export URL and seeded into the domain collector, so it always appears in the domains file even before SQL scanning starts. Other fixes: update url-rewriting E2E test port to avoid collision with file-deletions test, fix PHP 8.4 nullable parameter deprecations, update composer.lock for wp-php-toolkit dependencies.
6d16cb5 to
1c6174e
Compare
Summary
db-applycommand that readsdb.sql, rewrites URLs using wp-php-toolkit structured processors (BlockMarkupUrlProcessor,URLInTextProcessor), and executes statements against a target MySQL databasedb-sync, collecting all HTTP/HTTPS domains from decoded base64 values into.import-domains.jsonWP_MySQL_Naive_Query_Stream(vendored from WordPress/sqlite-database-integration#264) for streaming SQL parsing with cursor-based resumabilityURL rewriting pipeline
For each base64-decoded string value in INSERT statements:
ContentClassifier(port ofis_serialized()) → skipped (adjustings:N:length prefixes is out of scope)URLInTextProcessor, re-encodedwp_rewrite_urls()which handles HTML attributes, block comment JSON, text nodes, and CSSurl()in style attributesNo
preg_matchorDOMDocument— only the structured data processors fromwp-php-toolkit/data-liberation.New components
importer/lib/Base64ValueScanner.phpFROM_BASE64('...')expressions, decodes valuesimporter/lib/ContentClassifier.phpimporter/lib/DomainCollector.phpURLInTextProcessorimporter/lib/SqlValueUrlRewriter.phpimporter/lib/SqlStatementRewriter.phpimporter/lib/mysql-query-stream/WP_MySQL_Naive_Query_Stream+WP_MySQL_LexerUsage
Test plan
import-36-url-rewriting.test.js— scaffolded, needs implementation)🤖 Generated with Claude Code