17 min to read
Why Ephemeral Docker Sandbox Environments Are Essential For Secure Incident Response Analysis?
Somewhere in most SOCs there’s a Windows analysis VM nobody fully trusts anymore. Analysts open suspicious attachments in it, snapshot, revert, snapshot, revert — for months. Nobody remembers exactly what ran on it last Tuesday. That VM is the opposite of what a detonation environment is supposed to be, and it’s the reason a lot of IR teams end up reinfecting their own lab.
The fix isn’t a better VM. It’s treating every analysis environment as throwaway infrastructure — built fresh for one sample, used once, and destroyed completely before the next one touches it. Docker makes that pattern cheap enough to actually run at scale, but only if you understand where a default container falls short and what you need to bolt on to close the gap.
What “ephemeral” actually means in incident response
Ephemeral doesn’t mean “I deleted the container when I was done.” It means the environment cannot outlive the task it was created for, by construction. No persistent state, no reused base image across cases, no shared filesystem between sample #1 and sample #2.
This matters because most “malware analysis sandboxes” I’ve seen in the wild are ephemeral in name only. The VM gets reverted to a snapshot, sure, but the snapshot itself ages — installed tools drift, registry artifacts accumulate, and an analyst eventually runs two unrelated samples close enough together that evidence from one contaminates the other. Revert-based isolation is cheaper than rebuild-from-scratch isolation, which is exactly why teams default to it, and exactly why it slowly stops being trustworthy.
Docker’s own sandbox architecture — built for running AI coding agents, not malware, but worth studying for the isolation model — describes the lifecycle bluntly: a sandbox persists until you explicitly remove it with sbx rm, at which point the VM and everything inside it is deleted. There’s no “mostly clean.” It’s either alive or it’s gone. That binary state is what you want for detonation, and it’s harder to get right than people assume.
Why a plain docker run isn’t a detonation chamber
The most common mistake I see is treating docker run --rm as sufficient isolation for an unknown binary. It isn’t, for two separate reasons.
First, containers share the host kernel. A container is a set of namespaces and cgroups around a process tree — there is no second kernel between the sample and your hardware. When a kernel-level container escape shows up, it doesn’t matter how disposable your container was; the sample only needed a few seconds of CPU time before you destroyed it.
This isn’t theoretical. CVE-2024-21626 — nicknamed “Leaky Vessels” — let a process inside a container land in a file descriptor that pointed at the host’s working directory before pivot_root ran, because runc didn’t close it in time. A malicious image or a malicious Dockerfile build step could ride that leak straight onto the host filesystem, no privileged flag required. It affected every runc release from 1.0.0-rc93 through 1.1.11, which means it sat in production container runtimes for years before Snyk’s researchers found it. Before that there was CVE-2019-5736, which let a container overwrite the host’s runc binary itself on next invocation. Both are patched now. Neither is the last one — raesene’s container breakout CVE list runs to over a dozen entries across runc, BuildKit, and Kubernetes volume handling, and that’s just the publicly disclosed ones.
For CI/CD that’s a manageable risk because you control what runs. For a detonation chamber, the entire point is that you’re deliberately running something you don’t trust. Betting that no unpatched runc bug exists in your fleet, on the one workload category where you’re intentionally feeding it hostile code, is not a great trade.
Second, and more subtly: malware checks if it’s in a container, and changes behavior if it is. This is the part that actually costs analysts the most time. Modern samples routinely probe for /.dockerenv or /.containerenv, inspect /proc/self/cgroup for strings like docker or kubepods, and check whether the host’s IP sits in Docker’s default 172.17.0.0/16 bridge range. MITRE catalogs this whole category as T1497 — Virtualization/Sandbox Evasion: if the environment looks instrumented, the payload either stalls, no-ops, or only partially detonates. You don’t get an alert that your sandbox was detected. You just get a sample that did nothing interesting, and you move on thinking it was benign.
So isolation strength and isolation transparency are two different problems, and a default container solves neither particularly well.
A real case where this mattered: the 3CX supply chain attack
In March 2023, a trojanized build of 3CX’s Desktop App — software used by over 600,000 businesses for voice and video calls — started shipping a downloader called SUDDENICON to every customer who updated. Mandiant, brought in to investigate, traced it back further than anyone expected: the 3CX build environment itself had been compromised through a separate, earlier supply chain attack against a discontinued trading application called X_TRADER, which had quietly carried a backdoor named VEILEDSIGNAL onto an employee’s personal machine back in 2022. It was, in Mandiant’s own words, the first time they’d seen one software supply chain compromise cause another.
The malware family tree that came out of this case is a good study in what IR analysts actually have to detonate under pressure: TAXHAUL decrypting and loading shellcode from a file disguised as a Windows registry transaction log, COLDCAT as the follow-on downloader, POOLRAT as the macOS backdoor, and IconicStealer doing the actual data theft on a small number of high-value targets — mostly cryptocurrency firms, since Mandiant and Kaspersky both linked the operation to the North Korea-nexus actor Mandiant tracks as UNC4736.
Here’s the detail that matters for this post: analysts working the case in public write-ups on the sample noted that the dropped DLLs carried explicit anti-VM checks targeting VMware before they’d even decrypt their RC4-protected payload, and the behavior was tagged under MITRE’s virtualization/sandbox-evasion technique in automated analysis. The malware wasn’t hoping nobody would look at it. It was built assuming someone would, in a VM, and planned around that specific assumption. An analysis environment that looks exactly like every other VMware-based sandbox an APT has already fingerprinted a hundred times over isn’t just weaker — it can quietly fail you without telling you it failed.
That’s the actual argument for ephemeral, purpose-built Docker sandboxes over a shared “the malware VM” image: not just that disposability limits blast radius, but that a freshly built, non-default-looking environment is less likely to trip the exact evasion logic the adversary already tested against.
What good isolation looks like — borrowed from Docker’s own sandbox model
Docker had to solve a version of this problem for a different reason: running autonomous AI coding agents that execute arbitrary code, install packages, and run docker build on your behalf. The trust boundary is different from malware analysis, but the architecture they landed on maps almost one-to-one onto what an IR detonation chamber needs. Their security model breaks it into four isolation layers, and each one has a direct analogue in malware analysis tooling.
Hypervisor isolation. Every sandbox runs in its own lightweight microVM with its own Linux kernel — not a shared kernel with namespace boundaries, an actual separate kernel. That’s the difference between “a runc bug gives the sample your host” and “a runc-class bug gives the sample a kernel that was going to be destroyed in sixty seconds anyway.” For detonating anything you suspect is native code rather than a script or document macro, this is the layer that matters most, and it’s the layer a stock docker run simply does not have.
Network isolation. Docker’s sandboxes are deny-by-default: every outbound HTTP/HTTPS request has to match an explicit policy rule, raw TCP/UDP/ICMP are blocked at the network layer entirely, and traffic to loopback or private ranges never leaves. For a detonation chamber this is exactly the shape of an INetSim or FakeNet-style fake-network setup — the sample thinks it has a live C2 channel, and every request it makes actually lands in a sinkhole you control and log.
Docker Engine isolation. This is the one most homegrown malware sandboxes get wrong. If you mount the host’s Docker socket into your analysis container so you can inspect or orchestrate from inside it, you’ve just handed a hostile sample a documented path to full host control — mounting docker.sock is functionally equivalent to root on the host. Docker’s sandboxes avoid this by running an entirely separate Docker Engine inside the VM, so anything the agent (or, in our case, the sample) does with docker build or docker compose never has a path back to your real daemon. Build your detonation chamber the same way: nothing inside it should ever be able to reach /var/run/docker.sock on your host.
Credential isolation. Docker’s host-side proxy injects API credentials into outbound requests; the raw values never exist inside the sandboxed VM as environment variables or files. The IR equivalent: your VirusTotal API key, your threat-intel platform tokens, your case-management credentials should never be reachable from inside the box you’re using to detonate something designed to steal credentials.
Step by step: building an ephemeral detonation chamber with Docker
None of this needs exotic tooling. Here’s a working pattern using containers hardened well past their defaults, with a Firecracker/Kata-class microVM swap-in noted where containers alone genuinely aren’t enough.
Step 1: Harden the run, don’t trust the defaults.
# Bare minimum flags for any container touching an untrusted sample.
# Every flag here removes a specific, real capability the sample could abuse.
docker run \
--rm \ # destroy on exit, no leftover state
--network=detonation_net \ # never the default bridge — see compose file below
--cap-drop=ALL \ # strip every Linux capability, add back nothing
--security-opt no-new-privileges \ # block setuid/setgid privilege escalation
--security-opt seccomp=./seccomp-strict.json \
--read-only \ # root filesystem is immutable
--tmpfs /tmp:rw,noexec,nosuid,size=128m \ # writable scratch space that can't execute code
--pids-limit=128 \ # stop fork-bomb style DoS against your own host
--memory=512m --cpus=1 \ # cap resource exhaustion attacks
--name "det-$(uuidgen)" \ # unique, unguessable name per detonation run
sandbox/detonation:latest
A minimal seccomp profile to start from — deny by default, allow only what your analysis tooling actually needs:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": ["read", "write", "open", "openat", "close", "exit", "exit_group",
"mmap", "munmap", "brk", "fstat", "lseek", "execve", "wait4"],
"action": "SCMP_ACT_ALLOW"
}
]
}
That list will be too tight for real tooling — you’ll spend an annoying afternoon running strace against your own analysis container to find every syscall it legitimately needs and adding them one at a time. That afternoon is the actual cost of seccomp done properly, and it’s worth it.
Step 2: Give the sample a network that lies to it convincingly.
# docker-compose.detonation.yml
# Two services on an internal-only network. The sample believes it has
# a live C2 channel; fakenet actually sinkholes every request and logs it.
services:
fakenet:
image: remnux/fakenet-ng:latest
networks:
detonation_net:
ipv4_address: 10.10.10.2
cap_add: [NET_ADMIN]
sandbox:
image: sandbox/detonation:latest
networks:
detonation_net:
ipv4_address: 10.10.10.10
dns: [10.10.10.2] # force all DNS through fakenet too, not just HTTP
cap_drop: [ALL]
security_opt:
- no-new-privileges
- seccomp=./seccomp-strict.json
read_only: true
tmpfs:
- /tmp:rw,noexec,nosuid,size=128m
pids_limit: 128
mem_limit: 512m
volumes:
- ./samples/${SAMPLE_HASH}:/sample:ro # mount exactly one sample, read-only
depends_on: [fakenet]
networks:
detonation_net:
driver: bridge
internal: true # the line that actually matters — no route to the real internet
The internal: true flag is doing the real work here. Without it, a misconfigured fakenet or a container that finds a way to bypass its DNS override still has a path to a live gateway. With it, there is no gateway to find.
Step 3: Automate the entire lifecycle, including evidence capture, so nobody has to remember to do it by hand.
#!/usr/bin/env bash
# detonate.sh — one-shot ephemeral analysis run
# Usage: ./detonate.sh <sample_sha256>
set -euo pipefail
SAMPLE="$1"
RUN_ID="det-$(date +%s)-$(openssl rand -hex 4)"
EVIDENCE_DIR="./evidence/${RUN_ID}"
mkdir -p "${EVIDENCE_DIR}"
# Step 1: stand up a fresh stack — never reuse a running sandbox across samples
SAMPLE_HASH="${SAMPLE}" docker compose -f docker-compose.detonation.yml \
-p "${RUN_ID}" up -d
# Step 2: hard wall-clock limit regardless of what the sample does
timeout 300 docker wait "${RUN_ID}-sandbox-1" || true
# Step 3: pull evidence out BEFORE anything gets destroyed
docker logs "${RUN_ID}-sandbox-1" > "${EVIDENCE_DIR}/stdout.log" 2>&1
docker diff "${RUN_ID}-sandbox-1" > "${EVIDENCE_DIR}/filesystem-diff.txt"
docker export "${RUN_ID}-sandbox-1" -o "${EVIDENCE_DIR}/container-fs.tar"
docker cp "${RUN_ID}-fakenet-1:/var/log/fakenet/traffic.pcap" \
"${EVIDENCE_DIR}/network-capture.pcap" 2>/dev/null || true
# Step 4: burn it all down — containers, network, and the writable layer
docker compose -f docker-compose.detonation.yml -p "${RUN_ID}" down \
--volumes --remove-orphans
echo "Evidence preserved at ${EVIDENCE_DIR}. Sandbox ${RUN_ID} no longer exists."
That ordering — evidence out, then teardown — is the part that gets skipped when a tired analyst is doing this by hand at 2am. Scripting it removes the judgment call.
Step 4: Know when containers alone aren’t enough, and swap the runtime instead of the discipline.
# gVisor intercepts syscalls in userspace before they reach the host kernel,
# closing most of the attack surface that runc-class escapes like
# CVE-2024-21626 depend on. Slower disk and network I/O, much smaller blast radius.
docker run --runtime=runsc --rm --network=detonation_net sandbox/detonation:latest
If gVisor’s syscall interception still isn’t enough — you’re looking at a native loader, a suspected kernel exploit, or anything ransomware-adjacent — reach for Kata Containers or Firecracker microVMs instead. That’s not a hypothetical upgrade path; it’s the exact same hypervisor-isolation model Docker built sbx run on top of, just pointed at a malware sample instead of an AI agent.
Comparing your options honestly
| Isolation approach | Kernel exposure | Survives CVE-2024-21626-class bugs | Example tooling | Best for |
|---|---|---|---|---|
docker run (runc, default) |
Shared host kernel | No | Vanilla Docker Engine | Static triage of low-risk scripts and documents only |
docker run --runtime=runsc (gVisor) |
Userspace syscall proxy | Mostly | Google gVisor | Unknown binaries, moderate-confidence risk |
| Kata Containers / Firecracker microVM | Dedicated guest kernel | Yes | Kata Containers, AWS Firecracker | Native malware, ransomware, unknown loaders |
Docker Sandboxes (sbx) model |
Dedicated guest kernel + proxied network and credentials | Yes | Docker sbx CLI |
Reference architecture — built for AI agents, but the bar IR tooling should be matching |
Bottom line: if you’re only ever opening Office documents and shell scripts, a hardened container with seccomp and an internal-only network covers you. The moment you’re detonating an unknown native binary, treat the lack of a second kernel as a missing control, not an acceptable risk.
What breaks, and what to actually watch for
gVisor isn’t free. Some syscalls it doesn’t support at all, and malware that hits one of those gaps can crash or behave differently than it would on bare metal — that’s a false negative dressed up as a clean run, and you won’t know which one you got unless you’re logging syscall denials, not just exit codes.
Time-based evasion still works against ephemeral sandboxes exactly as well as it works against persistent ones. A sample that sleeps for six hours before doing anything interesting will outlast your timeout 300 without raising any flags. Ephemeral infrastructure fixes contamination and fingerprinting; it doesn’t fix patience.
And the most common operational failure I’ve seen isn’t technical at all — it’s rebuilding the same sandbox/detonation:latest image once and then reusing that exact image, with the exact same installed tool versions and the exact same fake hostname, across forty unrelated cases. That’s a “golden image” wearing ephemeral container clothing. Rebuild the image periodically, vary what’s inside it, and don’t let “disposable per run” quietly become “identical every run.”
Quick reference
# The flags that should be on every detonation container, no exceptions:
--rm --network=<internal-only-network> --cap-drop=ALL \
--security-opt no-new-privileges --security-opt seccomp=<strict-profile> \
--read-only --tmpfs /tmp:rw,noexec,nosuid --pids-limit=128 \
--memory=<cap> --cpus=<cap> --name "det-$(uuidgen)"
If the sample is a script, macro, or document: that’s probably enough. If it’s a native binary you can’t yet classify, add --runtime=runsc as your floor, not your ceiling, and have Kata or Firecracker ready for anything that looks like it might fight back.
Further reading: Docker Sandboxes architecture · Docker Sandboxes isolation layers · Docker Sandboxes default security posture · Snyk’s CVE-2024-21626 writeup · Mandiant’s 3CX supply chain analysis
Work with me
Got a security challenge, architecture review, or just want to talk through something? Book a free 30-min call.
Comments