ARM Architecture for x86 Engineers: Chapter 1 Lab
31 registers, load/store, and what happens when you boot an ARM64 VM on QEMU for the first time. This is a Lab. A true step by step lab sequencing with Jupyter. Have fun extending this!
I left Intel after 12 years last July. I took a nice long break of about 6 months. I just played around with LLMs, kayaked, spent time at home and travel a bit and gardened and discovered myself back. My love of creating and building and reading. And, eventually, I settled on a theme of what I wanted to do what next. I am becoming a believer in the possibility that Arm datacenter ecosystem is a strong contender to run the world of AI soon!
So I built a 13-chapter hands-on lab series on QEMU covering the full Arm platform stack, from scratch. This post is Chapter 1. Now, about those 31 registers.
The lab is built with Jupyter notebooks. It allows you to step through and understand each function, the underlying structure all while interacting with Arm’s incredibly versatile emulator QEMU.
Code
All code for this lab and the full 13-chapter series is at GitHub: [github.com/cakesandcode/ARM_vs_x86]
Only the chapters published on substack are fully debugged and tested code with clean readme, setup and FAQ sections. I recommend going slow - use the full substack and work with each chapter in sequence.
What’s different: 31 registers and a load/store architecture
The first thing an x86 engineer notices about AArch64: 31 general-purpose registers (X0–X30) versus x86-64’s 16 (RAX through R15) [Source 1, 2]. Nearly double. Register overloads and changes and inconsistencies that are constant headache in x86 firmware is largely gone. Maybe I change my mind by the time I am done with my learning but for now I say the register messes from technical debt in x86 largely disappears.
The second difference is deeper. AArch64 is a **load/store architecture**: arithmetic and logic instructions operate only on registers, never directly on memory [Source 3]. On x86, `ADD [mem], reg` is one instruction. On Arm, it’s three: load from memory into a register, add, store back. This sounds worse. It isn’t — it forces the compiler (and the firmware engineer) to be explicit about data movement. After 20 years of MOV being the universal Swiss Army knife, the discipline changes how you think about register allocation in boot code.
And the instructions are fixed-width: every AArch64 instruction is exactly 32 bits [Source 4]. On x86, instructions range from 1 to 15 bytes [Source 5] — the decoder has to figure out where one instruction ends and the next begins, consuming die area and power at billions of instructions per second. Fixed-width decoding is simpler, cheaper, and one of the structural advantages Arm has at the power envelope.
The lab: boot an ARM64 VM and prove it works
This isn’t a lecture — it’s a lab. Here’s what it does:
Step 1: Boot an ARM64 VM on QEMU’s `virt` machine.
The `virt` machine is QEMU’s reference Arm platform — no physical hardware needed. On macOS Apple Silicon, it runs with HVF (Hypervisor Framework) acceleration. On Linux, KVM. The shared infrastructure I built handles the plumbing:
- `qemu_launcher.py` — manages QEMU lifecycle, port allocation, accelerator detection (HVF/KVM)
- `qmp_client.py` — JSON socket client for QEMU Machine Protocol
- `serial_console.py` — serial console interaction over pexpect
- `assert_lib.py` — test assertions with structured pass/fail reporting
Step 2: Connect via QMP and query the vCPU.
QMP (QEMU Machine Protocol) is a JSON-over-TCP interface that lets you inspect and control a running VM [Source 6]. The lab connects to QMP, negotiates capabilities, and queries:
- `query-cpus-fast` — confirms vCPU count matches SMP configuration
- `query-version` — verifies QEMU version
- `human-monitor-command “info registers”` — reads register state
Step 3: Connect via serial console and inspect the guest.
The serial console captures the boot log and provides guest shell access. The lab logs in via cloud-init credentials, then reads `/proc/cpuinfo` to confirm:
- The guest kernel sees an ARM64 CPU (`CPU implementer` field present and readable)
- Under HVF on Apple Silicon, the implementer reads `0x61` (Apple) because HVF runs the guest on the real M-series core. Under TCG on Linux, it reads `0x41` (Arm Ltd) since TCG emulates the requested cortex-a76.
Step 4: Assert everything.
Every observation is captured as a structured assertion:
✅ PASS | QEMU process is alive
✅ PASS | query-cpus-fast returns ≥ 1 vCPU
✅ PASS | vCPU count == SMP (1)
✅ PASS | query-cpus-fast response has cpu-index field
✅ PASS | QEMU version string present
✅ PASS | HMP ‘info registers’ returns register state
✅ PASS | Guest /proc/cpuinfo shows ARM CPU implementer
These are not slides, not a course certificate. Working code with assertions you can run and verify and expand on!
QEMU fidelity: what’s real, what’s emulated
An important note that applies to every lab in this series: QEMU is functionally accurate but not cycle-accurate. The `virt` machine emulates the Arm platform correctly enough to test firmware interfaces (QMP, serial, ACPI tables, device hot-plug), but performance characteristics (cache hit rates, TLB walk latency, branch prediction behavior) differ from real Neoverse silicon.
Each lab annotates these gaps inline. For Chapter 1, the gap is minimal — the ISA fundamentals (registers, instruction encoding, exception levels) are architecturally defined and faithfully emulated.
The shared infrastructure
The 4 shared Python modules (QMP client, serial console, QEMU launcher, assertion library) are reused across all 13 labs. They have 41 unit tests of their own, plus one 12-second end-to-end integration test that spawns `qemu-system-aarch64`, waits for the guest login prompt over serial, and tears down cleanly. The unit tests exercise the shell-out and JSON-parsing logic with mocks; the integration test catches exactly the class of bug I just described above.
This was a deliberate design decision, reinforced by the debugging evening: the infrastructure is the foundation. If the test harness is unreliable — or if it only tests mocks and never the real toolchain — every lab built on it is suspect. So the harness is tested at both tiers now, mock and real.
What’s next
**Chapter 2: Memory Model** — DRAM starts at `0x40000000` on the Arm `virt` machine, not at the bottom of the address space like x86. That one address tells you everything about how Arm platforms partition physical memory. We’ll query the memory layout via QMP and hot-add 512 MiB of RAM at runtime.
Sources
[Arm Architecture Reference Manual (DDI0487)] — Arm (AArch64 GPR definition, §B1)
[Intel 64 and IA-32 Architectures SDM, Vol 1] — Intel (x86-64 GPR definition, §3.4)
[Arm A64 Instruction Set Architecture] — Arm (load/store architecture definition)
[Arm Learn the Architecture: AArch64 Instruction Set]— Arm (fixed-width 32-bit instructions)
[Intel SDM Vol 2: Instruction Set Reference]— Intel (variable-length 1-15 byte encoding
[QEMU Machine Protocol Introduction]— QEMU (JSON/TCP protocol)
Before this lab worked: six bugs and a 160 KB UEFI firmware
Before publishing this I had to run Chapter 1 myself, which should have been the easy part. I had 37 unit tests passing. I had a setup script. I had a Jupyter notebook with cell 1, cell 2, cell 3 in the expected order. What I did not have was a working boot — a fact I discovered over the course of an evening, one failure mode at a time.
The first bug was Python. `jupyter lab` exited with `bad interpreter: /opt/homebrew/Cellar/jupyterlab/4.4.7/libexec/bin/python: no such file or directory`. Homebrew had quietly upgraded Python from 3.13 to 3.14 at some point since the last time this laptop opened Jupyter, and jupyterlab’s libexec venv was built against the 3.13 binary, which no longer existed. I `brew reinstall jupyterlab`’d my way through that one without realising the reinstall had yanked Python 3.14 into the Cellar as a fresh dependency and promoted it over 3.12 as the default `python3`. My earlier `pip3 install pexpect`, which had quietly landed in 3.12’s site-packages, was now invisible to the `#!/usr/bin/env python3` shebang in the test script. Running the unit tests reported `37 PASS / 6 FAIL`. The six failures were all in `serial_console.py`. The error: `ImportError: pexpect is required`.
If you ever want a short course on why virtual environments exist, spend an evening losing to Homebrew.
The right fix, obvious in hindsight, was a project-scoped venv at `~/arm_qemu_labs/.venv`, pinned explicitly to `python@3.12`, registered as a Jupyter kernel named `ARM QEMU Labs (venv)`. No more `--break-system-packages`. No more wondering which `python3` the shebang means today. If you ever want a short course on why virtual environments exist, spend an evening losing to Homebrew.
With the venv healthy, cell 2 launched QEMU and promptly exited: `Invalid CPU model: cortex-a76`. HVF on Apple Silicon is pass-through virtualization — the guest runs on the actual M-series core — so QEMU cannot pretend to be a different microarchitecture. The `virt` machine accepts exactly four `-cpu` values under HVF: `host`, `max`, `cortex-a53`, `cortex-a57`. The lab’s default `cortex-a76` is fine under TCG (it is the closest QEMU-emulatable ancestor of Neoverse N1, which is what datacenter readers will want) but HVF rejects it outright. The launcher now detects the accelerator at construction time and silently coerces to `host` when it sees HVF, logging the swap. The notebook keeps `cortex-a76` as the stated preference. One notebook, two vaid answers, no per-box edits.
Cell 2 restarted and succeeded. Cell 3 — serial console connect, wait for login prompt — ran for 180 seconds and raised `TIMEOUT`. This was the one that cost the most time. The guest was clearly alive: QMP accepted my `query-version`, the subprocess was running, `pgrep -af qemu` returned a healthy PID. But pexpect’s serial buffer, when I finally thought to inspect it directly, was zero bytes. Not “I missed the login prompt” zero bytes. Zero bytes full stop. Nothing had come out of the serial TCP socket since QEMU started.
At this point I left the notebook and started poking at QEMU directly from a terminal, with `-nographic` so serial would land on stdout. Blank screen. A full minute of blank. Then I noticed the firmware file. `~/arm_qemu_labs/firmware/QEMU_EFI.fd`, 160 kilobytes. A real UEFI firmware is 64 megabytes.
The setup script’s firmware-search was written when Homebrew’s QEMU formula still shipped `QEMU_EFI.fd`. It stopped shipping that a while ago — the modern layout is `edk2-aarch64-code.fd` plus `edk2-arm-vars.fd` (code and varstore as separate flash volumes, using the pflash interface). The setup script’s fallback pattern was `find ... -name efi-virtio.rom -o -name QEMU_EFI.fd`, and when it failed to find the real firmware, it happily grabbed `efi-virtio.rom` — a 160 KB option ROM that stuffs a virtio driver into a network-boot stack — and copied it to `firmware/QEMU_EFI.fd`. QEMU loaded that 160 KB network boot ROM as firmware via the legacy `-bios` path, and emitted exactly zero bytes over serial for three minutes before I ran out of patience.
Fix: switch the launcher to the modern `-drive if=pflash,unit=0,file=<code>,readonly=on` + `-drive if=pflash,unit=1,file=<vars>` pair. Add `-netdev user,id=net0 -device virtio-net-device,netdev=net0` — because the seed ISO’s cloud-init tries to `apt-get install pciutils dtc acpica-tools` on first boot, and without a network it hangs on DNS it doesn’t have. Add `os.path.isfile(path)` and `os.path.getsize(path) >= 1_000_000` assertions inside `QEMULauncher.__init__` so the next person with the wrong firmware fails at construction time, not three minutes into a blank terminal. Write an integration test that actually boots the VM to a login prompt, because it turned out the 37 unit tests were all mocks and none of them had ever run `qemu-system-aarch64`.
While I was at it: add `pkill -f qemu-system-aarch64` to the top of cell 2 so re-running the launcher cell after a failed boot doesn’t die with `Failed to get “write” lock` on the qcow2. Move `import json` to the top of the notebook so cell 4’s call to `json.dumps` doesn’t raise `NameError` before cell 5 has a chance to import it. File these under “teardown discipline in Jupyter notebooks is an oxymoron; plan accordingly”.
Current count: 41/41 unit tests pass, plus one 12-second end-to-end integration test that boots the real Ubuntu 24.04 ARM64 image to the real `ubuntu login:` prompt on my actual MacBook Air. The setup script is idempotent. The notebook runs top to bottom. Everything in the rest of this post was verified against the image that scrolled `ARM QEMU Lab environment ready` past the serial console at 04:09 UTC this morning, not against the spec sheet.
This is the part of firmware work that is hard to put in a résumé bullet. “Debugged QEMU boot failure” covers it in four words. What actually happened was I rediscovered, over an evening, that distribution drift shows up in the place you least expect; that every `find ... -o ...` fallback in a shell script is a future bug in the shape of a file that happened to match; that the absence of output is a worse failure mode than crash; and that on day one of teaching anything the interesting material is not the syntax — it is the toolchain admitting it was never actually tested.
