Enable MMU + I/D cache on ARM926 (cv300)#67
Merged
Conversation
The previous cv300 startup ran uncached without MMU to keep the ARMv5 porting risk low, costing ~20% throughput vs ev300 (66 vs 83 KB/s for 256 KiB flash reads at 921600 baud). This adds an ARMv5 short-descriptor identity-mapped MMU + write-back D-cache + I-cache for cv300, mirroring what the ARMv7 path already does for ev300/cv500/cv610. ARM926 page-table entries differ from ARMv7: no XN bit, no TEX bits, so DEVICE = 0x00000C02 (vs ARMv7 0x00000C12) and CACHED = 0x00000C0E (vs ARMv7 0x00001C0E). Identity layout: every section is DEVICE-uncached by default, the 128 MiB starting at RAM_BASE is overridden as CACHED write-back+bufferable. TTBR0 takes a bare physical pointer on ARMv5 (no IRGN/RGN attribute bits, those came in ARMv7). Drain-write-buffer replaces ARMv7 `dsb`. The 16 KiB-aligned page table in BSS is now allocated once outside the #ifdef and shared between both paths (same layout: 4096 × 1 MiB sections × 4 B = 16 KiB). ## Verification QEMU `qemu-system-arm -M hi3516cv300`: agent boots cleanly, READY/DEFIB packet stream, no MMU faults. Real hi3516cv300 board over /dev/uart-IVGHP203Y-AF (ether3): jedec=ef4018 (W25Q128), 16 MiB, ram=0x80000000, caps=0x7f, version 2 256 KiB @ 921600: 3.02 s = 84.9 KB/s (was 65.8 KB/s — +29%) 1 MiB sustained @ 921600: 11.50 s = 89.0 KB/s Real hi3516ev300 regression on /dev/ttyUSB2 (ether5): jedec=0b4018, ram=0x40000000, caps=0x7f 256 KiB @ 921600: 3.08 s = 83.1 KB/s 1 MiB sustained @ 921600: 11.75 s = 87.1 KB/s cv300 now matches ev300 within run-to-run noise — baud rate is the bottleneck on both, the cache just keeps the CPU from being slower. ARM926 binary grew 152 bytes (12,720 → 12,872) for the new setup code. make test HOST_CC=gcc: 5406/5406. pytest: 402 passed. ruff & mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
widgetii
added a commit
that referenced
this pull request
May 5, 2026
## Summary Third agent platform after ev300 (V4) and cv300 (V3): the **cv500-family** generation (V5) — Cortex-A7 with the same memory map shared by cv500, av300, and dv300. Most of the wiring already existed. The Cortex-A7 startup path with MMU + I/D cache from #67 covers cv500-family. PR #65 already taught \`hisilicon_standard.py\` the cv500-family bootrom quirks (12-byte 0xFF run zeroing, gzip SPL boundary, non-fatal SPL TAIL when prestep_data is set). Two real fixes were needed to make av300 actually run. ## Fixes ### 1. Wrong UART base in `agent/Makefile` cv500 entry The cv500 block had \`UART_BASE = 0x12100000\` copy-pasted from the V3/hi3518ev200 layout. qemu-hisilicon's \`hi3516cv500_soc\` (which models the whole cv500 family, av300 included) puts UART0 at **\`0x120A0000\`**. QEMU surfaced it instantly: agent ran silently because writes went to unmapped I/O. Fix: \`0x120A0000\`. ### 2. SPL boundary detected from the wrong buffer \`_send_spl()\` in the agent-flash flow takes both \`firmware\` (the agent binary) and \`spl_override\` (OpenIPC u-boot used as the SPL stage), but \`_detect_spl_size\` was scanning \`firmware\`. The agent has no compressed payload, so the scan fell through to \`profile_max = 0x6000\` — overshooting the real SPL code (ends at \`0x5000\` on av300) by 0x1000 B. Those extra bytes include the **12-byte 0xFF padding at \`0x52E4\`** that PR #65 identified as the cv500-family bootrom RX-hang trigger. Result: SPL DATA chunk #21 stalls with no ACK, full 32-retry exhaustion. Fix: detect the SPL boundary from \`spl_override\` when present. Plus defense-in-depth: apply \`_zero_long_ff_runs\` to \`spl_data\` so any ≥12-byte 0xFF run that does slip through (e.g. a non-OpenIPC SPL build with FF padding earlier in the binary) doesn't trip the same bug. ### 3. Aliasing in \`chip_to_agent\` Match the existing \`gk7205v300 → gk7205v200\` pattern: one binary, multiple chip names route to it. Add \`hi3516av300 → hi3516cv500\` and \`hi3516dv300 → hi3516cv500\` to \`get_agent_binary()\`. No new Makefile entry needed. ### 4. Cosmetic profile name fix Both \`hi3516av300.json\` and \`hi3516dv300.json\` had \`"name": "hi3516cv500"\` (copy-paste artifact — the profile loader keys off filename so this is harmless, but inconsistent). ## Verification QEMU smoke test: \`qemu-system-arm -M hi3516cv500\` runs the agent cleanly, READY/DEFIB packet stream, no faults. Real **hi3516av300** board (\`/dev/uart-hi3516av300_imx415\`, MikroTik \`ether8\`): \`\`\` jedec=c22019 (Macronix MX25L256, 32 MiB), ram=0x80000000, caps=0x7f, version=2 256 KiB @ 921600: 3.04 s = 84.3 KB/s 1 MiB sustained @ 921600: 11.74 s = 87.2 KB/s flash bytes match installed u-boot byte-for-byte \`\`\` Cross-platform throughput summary (1 MiB read, 921600 baud): | | CPU | Generation | KB/s | |---|---|---|---| | ev300 | Cortex-A7 | V4 | 87.1 | | cv300 | ARM926 | V3 | 89.0 | | **av300** | **Cortex-A7** | **V5 (cv500-family)** | **87.2** | All three within ~2 KB/s — UART baud is the bottleneck. \`\`\` make -C agent test HOST_CC=gcc: 5406/5406 pytest tests/ -x --ignore=tests/fuzz: 402 passed, 2 skipped ruff check src/ tests/: All checks passed mypy src/defib/ --ignore-missing-imports: no issues found in 55 source files \`\`\` dv300 has no test board attached — it inherits the same binary as a silent alias. av300 hardware proves the same codepath cv500/dv300 will take. ## Test plan - [x] QEMU \`-M hi3516cv500\` boots agent cleanly (caught the wrong UART base) - [x] Real av300 hardware: agent upload, info, flash read at 921600 baud, content verified - [x] All test suites green (host C, pytest, ruff, mypy) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The hi3516cv300 (ARM926EJ-S) agent that landed in #66 ran with MMU and caches off to keep the ARMv5 porting risk low. That cost ~20–30% throughput vs the Cortex-A7 path: 66 KB/s vs 83 KB/s for a 256 KiB flash read at 921600 baud.
This PR brings the ARM926 path to parity by enabling MMU (identity-mapped) + write-back D-cache + I-cache, mirroring what ev300/cv500/cv610 already use under the ARMv7 branch.
ARMv5 vs ARMv7 differences handled
0x00000C02(ARMv7 was0x00000C12with XN)0x00000C0E(ARMv7 was0x00001C0Ewith TEX=001 outer-WB-WA)mcr p15, 0, _, c7, c10, 4(drain write buffer) replaces ARMv7dsb. Three architectural NOPs (mov r0, r0) flush the prefetch buffer after enabling MMU instead ofisb.MRS/BIC/ORR/MSR cpsr_cinstead ofcps #imm(which is ARMv6+).#ifdef CPU_ARM926and shared between both CPU paths (same layout — 4096 × 1 MiB sections × 4 B = 16 KiB).Verification
QEMU
qemu-system-arm -M hi3516cv300: agent boots cleanly, no MMU faults, READY/DEFIB packet stream.Real hi3516cv300 (
/dev/uart-IVGHP203Y-AF, MikroTikether3):```
jedec=ef4018 (W25Q128), 16 MiB, ram=0x80000000, caps=0x7f, version=2
256 KiB @ 921600: 3.02 s = 84.9 KB/s (pre-PR: 65.8 KB/s, +29%)
1 MiB sustained @ 921600: 11.50 s = 89.0 KB/s
```
Real hi3516ev300 regression (
/dev/ttyUSB2, MikroTikether5):```
jedec=0b4018, 16 MiB, ram=0x40000000, caps=0x7f
256 KiB @ 921600: 3.08 s = 83.1 KB/s
1 MiB sustained @ 921600: 11.75 s = 87.1 KB/s
```
cv300 now matches (slightly exceeds) ev300, well within run-to-run noise — at 921600 baud the UART is the bottleneck on both, and the cache just keeps the CPU from making things worse.
```
make -C agent test HOST_CC=gcc: 5406/5406
pytest tests/ -x --ignore=tests/fuzz: 402 passed, 2 skipped
ruff check src/ tests/: All checks passed
mypy src/defib/ --ignore-missing-imports: no issues found in 55 source files
```
Binary size: cv300 grew 152 bytes (12,720 → 12,872) for the page-table fill + MMU enable. ev300 unchanged.
Test plan
🤖 Generated with Claude Code