Skip to content

Enable MMU + I/D cache on ARM926 (cv300)#67

Merged
widgetii merged 1 commit intomasterfrom
agent-cv300-mmu-cache
May 5, 2026
Merged

Enable MMU + I/D cache on ARM926 (cv300)#67
widgetii merged 1 commit intomasterfrom
agent-cv300-mmu-cache

Conversation

@widgetii
Copy link
Copy Markdown
Member

@widgetii widgetii commented May 5, 2026

Summary

The hi3516cv300 (ARM926EJ-S) agent that landed in #66 ran with MMU and caches off to keep the ARMv5 porting risk low. That cost ~20–30% throughput vs the Cortex-A7 path: 66 KB/s vs 83 KB/s for a 256 KiB flash read at 921600 baud.

This PR brings the ARM926 path to parity by enabling MMU (identity-mapped) + write-back D-cache + I-cache, mirroring what ev300/cv500/cv610 already use under the ARMv7 branch.

ARMv5 vs ARMv7 differences handled

  • Section descriptor format: no XN bit, no TEX bits.
    • DEVICE: 0x00000C02 (ARMv7 was 0x00000C12 with XN)
    • CACHED: 0x00000C0E (ARMv7 was 0x00001C0E with TEX=001 outer-WB-WA)
  • TTBR0: bare physical pointer (no IRGN/RGN attribute bits — those are ARMv7).
  • Memory barriers: mcr p15, 0, _, c7, c10, 4 (drain write buffer) replaces ARMv7 dsb. Three architectural NOPs (mov r0, r0) flush the prefetch buffer after enabling MMU instead of isb.
  • Mode switch: explicit MRS/BIC/ORR/MSR cpsr_c instead of cps #imm (which is ARMv6+).
  • The 16 KiB-aligned page table in BSS is now allocated once outside the #ifdef CPU_ARM926 and shared between both CPU paths (same layout — 4096 × 1 MiB sections × 4 B = 16 KiB).

Verification

QEMU qemu-system-arm -M hi3516cv300: agent boots cleanly, no MMU faults, READY/DEFIB packet stream.

Real hi3516cv300 (/dev/uart-IVGHP203Y-AF, MikroTik ether3):
```
jedec=ef4018 (W25Q128), 16 MiB, ram=0x80000000, caps=0x7f, version=2
256 KiB @ 921600: 3.02 s = 84.9 KB/s (pre-PR: 65.8 KB/s, +29%)
1 MiB sustained @ 921600: 11.50 s = 89.0 KB/s
```

Real hi3516ev300 regression (/dev/ttyUSB2, MikroTik ether5):
```
jedec=0b4018, 16 MiB, ram=0x40000000, caps=0x7f
256 KiB @ 921600: 3.08 s = 83.1 KB/s
1 MiB sustained @ 921600: 11.75 s = 87.1 KB/s
```

cv300 now matches (slightly exceeds) ev300, well within run-to-run noise — at 921600 baud the UART is the bottleneck on both, and the cache just keeps the CPU from making things worse.

```
make -C agent test HOST_CC=gcc: 5406/5406
pytest tests/ -x --ignore=tests/fuzz: 402 passed, 2 skipped
ruff check src/ tests/: All checks passed
mypy src/defib/ --ignore-missing-imports: no issues found in 55 source files
```

Binary size: cv300 grew 152 bytes (12,720 → 12,872) for the page-table fill + MMU enable. ev300 unchanged.

Test plan

  • QEMU smoke test (no instruction faults, agent emits READY)
  • Real cv300 hardware: agent info + flash read at 921600 baud, byte-perfect
  • Real ev300 hardware: regression — same throughput as before
  • All test suites green (host C, pytest, ruff, mypy)

🤖 Generated with Claude Code

The previous cv300 startup ran uncached without MMU to keep the ARMv5
porting risk low, costing ~20% throughput vs ev300 (66 vs 83 KB/s for
256 KiB flash reads at 921600 baud).  This adds an ARMv5 short-descriptor
identity-mapped MMU + write-back D-cache + I-cache for cv300, mirroring
what the ARMv7 path already does for ev300/cv500/cv610.

ARM926 page-table entries differ from ARMv7: no XN bit, no TEX bits, so
DEVICE = 0x00000C02 (vs ARMv7 0x00000C12) and CACHED = 0x00000C0E (vs
ARMv7 0x00001C0E).  Identity layout: every section is DEVICE-uncached
by default, the 128 MiB starting at RAM_BASE is overridden as CACHED
write-back+bufferable.  TTBR0 takes a bare physical pointer on ARMv5
(no IRGN/RGN attribute bits, those came in ARMv7).  Drain-write-buffer
replaces ARMv7 `dsb`.

The 16 KiB-aligned page table in BSS is now allocated once outside the
#ifdef and shared between both paths (same layout: 4096 × 1 MiB sections
× 4 B = 16 KiB).

## Verification

QEMU `qemu-system-arm -M hi3516cv300`: agent boots cleanly, READY/DEFIB
packet stream, no MMU faults.

Real hi3516cv300 board over /dev/uart-IVGHP203Y-AF (ether3):
  jedec=ef4018 (W25Q128), 16 MiB, ram=0x80000000, caps=0x7f, version 2
  256 KiB @ 921600: 3.02 s = 84.9 KB/s  (was 65.8 KB/s — +29%)
  1 MiB sustained @ 921600: 11.50 s = 89.0 KB/s

Real hi3516ev300 regression on /dev/ttyUSB2 (ether5):
  jedec=0b4018, ram=0x40000000, caps=0x7f
  256 KiB @ 921600: 3.08 s = 83.1 KB/s
  1 MiB sustained @ 921600: 11.75 s = 87.1 KB/s

cv300 now matches ev300 within run-to-run noise — baud rate is the
bottleneck on both, the cache just keeps the CPU from being slower.
ARM926 binary grew 152 bytes (12,720 → 12,872) for the new setup code.

make test HOST_CC=gcc: 5406/5406.  pytest: 402 passed.  ruff & mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit c57bb9c into master May 5, 2026
13 checks passed
@widgetii widgetii deleted the agent-cv300-mmu-cache branch May 5, 2026 09:28
widgetii added a commit that referenced this pull request May 5, 2026
## Summary

Third agent platform after ev300 (V4) and cv300 (V3): the
**cv500-family** generation (V5) — Cortex-A7 with the same memory map
shared by cv500, av300, and dv300.

Most of the wiring already existed. The Cortex-A7 startup path with MMU
+ I/D cache from #67 covers cv500-family. PR #65 already taught
\`hisilicon_standard.py\` the cv500-family bootrom quirks (12-byte 0xFF
run zeroing, gzip SPL boundary, non-fatal SPL TAIL when prestep_data is
set). Two real fixes were needed to make av300 actually run.

## Fixes

### 1. Wrong UART base in `agent/Makefile` cv500 entry

The cv500 block had \`UART_BASE = 0x12100000\` copy-pasted from the
V3/hi3518ev200 layout. qemu-hisilicon's \`hi3516cv500_soc\` (which
models the whole cv500 family, av300 included) puts UART0 at
**\`0x120A0000\`**. QEMU surfaced it instantly: agent ran silently
because writes went to unmapped I/O. Fix: \`0x120A0000\`.

### 2. SPL boundary detected from the wrong buffer

\`_send_spl()\` in the agent-flash flow takes both \`firmware\` (the
agent binary) and \`spl_override\` (OpenIPC u-boot used as the SPL
stage), but \`_detect_spl_size\` was scanning \`firmware\`. The agent
has no compressed payload, so the scan fell through to \`profile_max =
0x6000\` — overshooting the real SPL code (ends at \`0x5000\` on av300)
by 0x1000 B. Those extra bytes include the **12-byte 0xFF padding at
\`0x52E4\`** that PR #65 identified as the cv500-family bootrom RX-hang
trigger. Result: SPL DATA chunk #21 stalls with no ACK, full 32-retry
exhaustion.

Fix: detect the SPL boundary from \`spl_override\` when present. Plus
defense-in-depth: apply \`_zero_long_ff_runs\` to \`spl_data\` so any
≥12-byte 0xFF run that does slip through (e.g. a non-OpenIPC SPL build
with FF padding earlier in the binary) doesn't trip the same bug.

### 3. Aliasing in \`chip_to_agent\`

Match the existing \`gk7205v300 → gk7205v200\` pattern: one binary,
multiple chip names route to it. Add \`hi3516av300 → hi3516cv500\` and
\`hi3516dv300 → hi3516cv500\` to \`get_agent_binary()\`. No new Makefile
entry needed.

### 4. Cosmetic profile name fix

Both \`hi3516av300.json\` and \`hi3516dv300.json\` had \`"name":
"hi3516cv500"\` (copy-paste artifact — the profile loader keys off
filename so this is harmless, but inconsistent).

## Verification

QEMU smoke test: \`qemu-system-arm -M hi3516cv500\` runs the agent
cleanly, READY/DEFIB packet stream, no faults.

Real **hi3516av300** board (\`/dev/uart-hi3516av300_imx415\`, MikroTik
\`ether8\`):
\`\`\`
jedec=c22019 (Macronix MX25L256, 32 MiB), ram=0x80000000, caps=0x7f,
version=2
256 KiB @ 921600: 3.04 s = 84.3 KB/s
1 MiB sustained @ 921600: 11.74 s = 87.2 KB/s
flash bytes match installed u-boot byte-for-byte
\`\`\`

Cross-platform throughput summary (1 MiB read, 921600 baud):

| | CPU | Generation | KB/s |
|---|---|---|---|
| ev300 | Cortex-A7 | V4 | 87.1 |
| cv300 | ARM926 | V3 | 89.0 |
| **av300** | **Cortex-A7** | **V5 (cv500-family)** | **87.2** |

All three within ~2 KB/s — UART baud is the bottleneck.

\`\`\`
make -C agent test HOST_CC=gcc:    5406/5406
pytest tests/ -x --ignore=tests/fuzz: 402 passed, 2 skipped
ruff check src/ tests/:             All checks passed
mypy src/defib/ --ignore-missing-imports: no issues found in 55 source
files
\`\`\`

dv300 has no test board attached — it inherits the same binary as a
silent alias. av300 hardware proves the same codepath cv500/dv300 will
take.

## Test plan

- [x] QEMU \`-M hi3516cv500\` boots agent cleanly (caught the wrong UART
base)
- [x] Real av300 hardware: agent upload, info, flash read at 921600
baud, content verified
- [x] All test suites green (host C, pytest, ruff, mypy)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant