Enable MMU + I/D cache on ARM926 (cv300) by widgetii · Pull Request #67 · OpenIPC/defib

widgetii · 2026-05-05T09:16:21Z

Summary

The hi3516cv300 (ARM926EJ-S) agent that landed in #66 ran with MMU and caches off to keep the ARMv5 porting risk low. That cost ~20–30% throughput vs the Cortex-A7 path: 66 KB/s vs 83 KB/s for a 256 KiB flash read at 921600 baud.

This PR brings the ARM926 path to parity by enabling MMU (identity-mapped) + write-back D-cache + I-cache, mirroring what ev300/cv500/cv610 already use under the ARMv7 branch.

ARMv5 vs ARMv7 differences handled

Section descriptor format: no XN bit, no TEX bits.
- DEVICE: 0x00000C02 (ARMv7 was 0x00000C12 with XN)
- CACHED: 0x00000C0E (ARMv7 was 0x00001C0E with TEX=001 outer-WB-WA)
TTBR0: bare physical pointer (no IRGN/RGN attribute bits — those are ARMv7).
Memory barriers: mcr p15, 0, _, c7, c10, 4 (drain write buffer) replaces ARMv7 dsb. Three architectural NOPs (mov r0, r0) flush the prefetch buffer after enabling MMU instead of isb.
Mode switch: explicit MRS/BIC/ORR/MSR cpsr_c instead of cps #imm (which is ARMv6+).
The 16 KiB-aligned page table in BSS is now allocated once outside the #ifdef CPU_ARM926 and shared between both CPU paths (same layout — 4096 × 1 MiB sections × 4 B = 16 KiB).

Verification

QEMU qemu-system-arm -M hi3516cv300: agent boots cleanly, no MMU faults, READY/DEFIB packet stream.

Real hi3516cv300 (/dev/uart-IVGHP203Y-AF, MikroTik ether3):
```
jedec=ef4018 (W25Q128), 16 MiB, ram=0x80000000, caps=0x7f, version=2
256 KiB @ 921600: 3.02 s = 84.9 KB/s (pre-PR: 65.8 KB/s, +29%)
1 MiB sustained @ 921600: 11.50 s = 89.0 KB/s
```

Real hi3516ev300 regression (/dev/ttyUSB2, MikroTik ether5):
```
jedec=0b4018, 16 MiB, ram=0x40000000, caps=0x7f
256 KiB @ 921600: 3.08 s = 83.1 KB/s
1 MiB sustained @ 921600: 11.75 s = 87.1 KB/s
```

cv300 now matches (slightly exceeds) ev300, well within run-to-run noise — at 921600 baud the UART is the bottleneck on both, and the cache just keeps the CPU from making things worse.

```
make -C agent test HOST_CC=gcc: 5406/5406
pytest tests/ -x --ignore=tests/fuzz: 402 passed, 2 skipped
ruff check src/ tests/: All checks passed
mypy src/defib/ --ignore-missing-imports: no issues found in 55 source files
```

Binary size: cv300 grew 152 bytes (12,720 → 12,872) for the page-table fill + MMU enable. ev300 unchanged.

Test plan

QEMU smoke test (no instruction faults, agent emits READY)
Real cv300 hardware: agent info + flash read at 921600 baud, byte-perfect
Real ev300 hardware: regression — same throughput as before
All test suites green (host C, pytest, ruff, mypy)

🤖 Generated with Claude Code

The previous cv300 startup ran uncached without MMU to keep the ARMv5 porting risk low, costing ~20% throughput vs ev300 (66 vs 83 KB/s for 256 KiB flash reads at 921600 baud). This adds an ARMv5 short-descriptor identity-mapped MMU + write-back D-cache + I-cache for cv300, mirroring what the ARMv7 path already does for ev300/cv500/cv610. ARM926 page-table entries differ from ARMv7: no XN bit, no TEX bits, so DEVICE = 0x00000C02 (vs ARMv7 0x00000C12) and CACHED = 0x00000C0E (vs ARMv7 0x00001C0E). Identity layout: every section is DEVICE-uncached by default, the 128 MiB starting at RAM_BASE is overridden as CACHED write-back+bufferable. TTBR0 takes a bare physical pointer on ARMv5 (no IRGN/RGN attribute bits, those came in ARMv7). Drain-write-buffer replaces ARMv7 `dsb`. The 16 KiB-aligned page table in BSS is now allocated once outside the #ifdef and shared between both paths (same layout: 4096 × 1 MiB sections × 4 B = 16 KiB). ## Verification QEMU `qemu-system-arm -M hi3516cv300`: agent boots cleanly, READY/DEFIB packet stream, no MMU faults. Real hi3516cv300 board over /dev/uart-IVGHP203Y-AF (ether3): jedec=ef4018 (W25Q128), 16 MiB, ram=0x80000000, caps=0x7f, version 2 256 KiB @ 921600: 3.02 s = 84.9 KB/s (was 65.8 KB/s — +29%) 1 MiB sustained @ 921600: 11.50 s = 89.0 KB/s Real hi3516ev300 regression on /dev/ttyUSB2 (ether5): jedec=0b4018, ram=0x40000000, caps=0x7f 256 KiB @ 921600: 3.08 s = 83.1 KB/s 1 MiB sustained @ 921600: 11.75 s = 87.1 KB/s cv300 now matches ev300 within run-to-run noise — baud rate is the bottleneck on both, the cache just keeps the CPU from being slower. ARM926 binary grew 152 bytes (12,720 → 12,872) for the new setup code. make test HOST_CC=gcc: 5406/5406. pytest: 402 passed. ruff & mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Third agent platform after ev300 (V4) and cv300 (V3): the **cv500-family** generation (V5) — Cortex-A7 with the same memory map shared by cv500, av300, and dv300. Most of the wiring already existed. The Cortex-A7 startup path with MMU + I/D cache from #67 covers cv500-family. PR #65 already taught \`hisilicon_standard.py\` the cv500-family bootrom quirks (12-byte 0xFF run zeroing, gzip SPL boundary, non-fatal SPL TAIL when prestep_data is set). Two real fixes were needed to make av300 actually run. ## Fixes ### 1. Wrong UART base in `agent/Makefile` cv500 entry The cv500 block had \`UART_BASE = 0x12100000\` copy-pasted from the V3/hi3518ev200 layout. qemu-hisilicon's \`hi3516cv500_soc\` (which models the whole cv500 family, av300 included) puts UART0 at **\`0x120A0000\`**. QEMU surfaced it instantly: agent ran silently because writes went to unmapped I/O. Fix: \`0x120A0000\`. ### 2. SPL boundary detected from the wrong buffer \`_send_spl()\` in the agent-flash flow takes both \`firmware\` (the agent binary) and \`spl_override\` (OpenIPC u-boot used as the SPL stage), but \`_detect_spl_size\` was scanning \`firmware\`. The agent has no compressed payload, so the scan fell through to \`profile_max = 0x6000\` — overshooting the real SPL code (ends at \`0x5000\` on av300) by 0x1000 B. Those extra bytes include the **12-byte 0xFF padding at \`0x52E4\`** that PR #65 identified as the cv500-family bootrom RX-hang trigger. Result: SPL DATA chunk #21 stalls with no ACK, full 32-retry exhaustion. Fix: detect the SPL boundary from \`spl_override\` when present. Plus defense-in-depth: apply \`_zero_long_ff_runs\` to \`spl_data\` so any ≥12-byte 0xFF run that does slip through (e.g. a non-OpenIPC SPL build with FF padding earlier in the binary) doesn't trip the same bug. ### 3. Aliasing in \`chip_to_agent\` Match the existing \`gk7205v300 → gk7205v200\` pattern: one binary, multiple chip names route to it. Add \`hi3516av300 → hi3516cv500\` and \`hi3516dv300 → hi3516cv500\` to \`get_agent_binary()\`. No new Makefile entry needed. ### 4. Cosmetic profile name fix Both \`hi3516av300.json\` and \`hi3516dv300.json\` had \`"name": "hi3516cv500"\` (copy-paste artifact — the profile loader keys off filename so this is harmless, but inconsistent). ## Verification QEMU smoke test: \`qemu-system-arm -M hi3516cv500\` runs the agent cleanly, READY/DEFIB packet stream, no faults. Real **hi3516av300** board (\`/dev/uart-hi3516av300_imx415\`, MikroTik \`ether8\`): \`\`\` jedec=c22019 (Macronix MX25L256, 32 MiB), ram=0x80000000, caps=0x7f, version=2 256 KiB @ 921600: 3.04 s = 84.3 KB/s 1 MiB sustained @ 921600: 11.74 s = 87.2 KB/s flash bytes match installed u-boot byte-for-byte \`\`\` Cross-platform throughput summary (1 MiB read, 921600 baud): | | CPU | Generation | KB/s | |---|---|---|---| | ev300 | Cortex-A7 | V4 | 87.1 | | cv300 | ARM926 | V3 | 89.0 | | **av300** | **Cortex-A7** | **V5 (cv500-family)** | **87.2** | All three within ~2 KB/s — UART baud is the bottleneck. \`\`\` make -C agent test HOST_CC=gcc: 5406/5406 pytest tests/ -x --ignore=tests/fuzz: 402 passed, 2 skipped ruff check src/ tests/: All checks passed mypy src/defib/ --ignore-missing-imports: no issues found in 55 source files \`\`\` dv300 has no test board attached — it inherits the same binary as a silent alias. av300 hardware proves the same codepath cv500/dv300 will take. ## Test plan - [x] QEMU \`-M hi3516cv500\` boots agent cleanly (caught the wrong UART base) - [x] Real av300 hardware: agent upload, info, flash read at 921600 baud, content verified - [x] All test suites green (host C, pytest, ruff, mypy) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

widgetii merged commit c57bb9c into master May 5, 2026
13 checks passed

widgetii deleted the agent-cv300-mmu-cache branch May 5, 2026 09:28

widgetii mentioned this pull request May 5, 2026

Add hi3516av300 + hi3516dv300 (cv500-family) to flash agent #68

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable MMU + I/D cache on ARM926 (cv300)#67

Enable MMU + I/D cache on ARM926 (cv300)#67
widgetii merged 1 commit intomasterfrom
agent-cv300-mmu-cache

widgetii commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

widgetii commented May 5, 2026

Summary

ARMv5 vs ARMv7 differences handled

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant