Architecture: The Four Layers
Herd simplifies the complexity of microVM orchestration by implementing a four-layer delivery stack. This design allows standard application developers to leverage the security and isolation of Firecracker without managing raw disk images or network configurations.
📦 Layer 1: OCI Translation
Herd treats standard Docker/OCI images as first-class citizens. When you deploy an image, Herd performs the following:
- Metadata Extraction: It parses the image manifest for CMD, WORKDIR, and ENV variables.
- Snapshotting: Using containerd and the devmapper snapshotter, it creates a Copy-on-Write (CoW) block device from the image layers.
- Thin-Pool Management: Host storage is managed via a dedicated devmapper thin-pool, ensuring microVMs share image data while having isolated write layers.
⚡ Layer 2: L7 "Wake-on-Request" Proxy
The Herd Data Plane is a high-speed reverse proxy that manages session-affine routing.
- Traffic Interception: Requests arriving on the Data Plane (port 8080) are inspected for an X-Session-ID header.
- Cold Booting: If a session ID is provided but the corresponding microVM is not running, the proxy triggers a cold-boot sequence (sub-500ms) before forwarding the traffic.
- Connectivity: Once the VM is ready, traffic is tunneled through the hypervisor boundary into the guest.
🌐 Layer 3: Automated IPAM
Networking is isolated at the host level to prevent neighbor scanning and provide absolute containment.
- TAP Interfaces: Each microVM gets its own unique TAP device linked to a host-side bridge.
- Point-to-Point Networking: VMs are configured with a /32 IP address, routing all traffic through a virtual gateway on the host.
- TAP Ownership: To maintain privilege separation, Herd explicitly assigns the TAP device's ownership to the unprivileged jailer user during creation, allowing the jailed Firecracker process to attach to it without CAP_NET_ADMIN.
- NAT Routing: Herd automatically manages iptables NAT (MASQUERADE) and FORWARD rules to allow internet access while preventing VM-to-VM traffic.
🤖 Layer 4: Guest Agent Execution
The herd-guest-agent is the bridge between the host daemon and the user workload.
- Initialize RAM Disk (initrd): Herd injects the agent as an initrd ramdisk at boot time.
- PID 1 Role: The agent runs as PID 1 inside the VM, responsible for mounting virtual filesystems (/proc, /sys), configuring networking, and chroot-ing into the user's workload filesystem.
- Vsock Communication: The host communicates with the agent over AF_VSOCK, allowing for execution commands, logs, and heartbeats without an internal network listener.
🔒 Layer 5: Dynamic UID Isolation
Herd implements a "Dynamic UID Pool" to ensure that untrusted guest code is strictly contained in a distinct security domain. This prevents lateral movement (cross-jail attacks) in multi-tenant environments.
- Per-VM UID Leasing: Every concurrent microVM is assigned a unique, numeric UID from a configured pool (e.g.,
300000-301000). - Cryptographic Separation: Since each VM runs as a different UID, the Linux kernel prevents process A from signaling or interacting with process B.
- Filesystem & Device Ownership: Herd explicitly
chowns the VM's chroot root, vsock sockets, and block device nodes to the leased UID.
Directory & Device Ownership Model
| Path / Object | Creator | Owner | Mode | Purpose |
|---|---|---|---|---|
/srv/jailer/ |
herd init |
root:root |
0755 |
Base directory for all jails. |
.../firecracker/<vmID>/ |
Spawn |
root:root |
0755 |
Per-VM parent directory. |
.../root/ |
Spawn |
uid_N:uid_N |
0700 |
Chroot Root: Only the leased UID can enter. |
.../root/run/ |
Spawn |
uid_N:uid_N |
0700 |
Contains vsock sockets and config. |
.../root/dev/vda |
Spawn |
uid_N:uid_N |
0600 |
Block Device: Mknod node for the rootfs. |
tap-<suffix> |
Spawn |
uid_N:uid_N |
— | Network: TAP device owned by the jailer UID. |
[!NOTE] Numeric UIDs: These UIDs do not require entries in
/etc/passwd. The hypervisor and thejailerbinary operate directly on numeric IDs for speed and to avoid host-level configuration bloat.