Assembly is the language between high-level code and raw machine bytes. When you open a sample in Ghidra or Binary Ninja, what you see is the disassembler’s reconstruction of that language. This guide teaches you to read it fluently — not just to understand each instruction, but to recognize the patterns that malware authors use: API resolvers, XOR decryptors, persistence loops, and anti-debug tricks.
Coverage spans x86 (32-bit), x64 (64-bit), and ARM/AArch64 (embedded, mobile, and modern Windows on Arm targets).
| Related posts in this blog: Understanding and Attacking EDRs | EDR Bypass Roadmap | Anti-Debugging Techniques | Windows API Attack Surface |
- Why Assembly for Malware Analysis?
- Getting Started — What to Expect
- VSCode Setup for Assembly Practice
- Assembly 101 — How to Read Assembler Code
- CPU Registers — The Fast Lane
- Memory and Addressing Modes
- The Stack — How Functions Think
- Core Instructions in Depth
- Calling Conventions
- ARM Assembly
- Reading Disassembly in Ghidra and Binary Ninja
- Common Malware Patterns
- Quick Reference Tables
Why Assembly for Malware Analysis?
Modern malware arrives stripped of symbols, packed, and obfuscated. Decompilers help but they lie — they reconstruct intent from behavior, and when the behavior is adversarial, the reconstruction drifts. Raw disassembly never lies: every byte the CPU executes is exactly what you see.
Specifically, assembly literacy lets you:
- Identify API calls even when the import table is empty (PEB walking,
GetProcAddresschains) - Recognize crypto primitives by their bitwise patterns (XOR loops, XTEA key schedules, Salsa20 quarter-rounds)
- Spot anti-analysis tricks before they fire (
RDTSCtiming,IsDebuggerPresentchecks,NtQueryInformationProcesscalls) - Understand shellcode that can never be decompiled — it has no PE header, no sections, no symbols
Getting Started — What to Expect
Learning assembly for the first time feels like having the rug pulled out: no types, no function names, no meaningful variable names — everything is registers, offsets, and flags. The cognitive load is real, but it drops fast once the patterns click.
The Mental Model Shift
In C you write x = a + b. In assembly you first load a into a register, add b to it, and the result sits in the same register. The instruction stream is completely flat — there is no notion of scope, type, or lifetime beyond what the calling convention imposes.
The most important shift: think in state, not abstractions. At any point in a function you can ask: what is in EAX right now? What does [EBP-8] hold? Where did ESP go? Building this running state machine in your head is the core skill the job requires.
What Is Actually Hard
- Registers carry context that changes line-by-line. A register can hold a loop counter on one line and a pointer on the next. There is no IDE tooltip to tell you which it is right now.
- Flags are invisible shared state.
CMP EAX, EBXsets flags, and then ten instructions later aJLreads them. Other instructions between the compare and the branch can also modify flags — beginners miss this constantly. - Obfuscation looks syntactically identical to normal code. A dead XOR, a fake loop, a
JMPto the very next instruction — nothing in the syntax signals “this is junk.” - Calling conventions are implicit. Nothing in the binary says “this is cdecl.” You have to infer it from how the caller prepares arguments and how the callee tears down.
- Pointer arithmetic and integer arithmetic are indistinguishable.
ADD EAX, 4could be advancing a pointer by oneintor incrementing a counter by four. Only context tells you which.
What Clicks Surprisingly Quickly
- Most real malware uses fewer than 20 distinct instructions.
MOV,PUSH/POP,CMP/TEST,JE/JNE/JL/JG,CALL/RET,XOR/AND/OR,ADD/SUB,LEA,INC/DEC. Master these and you can read around 80 % of what you will encounter. - Prologues and epilogues are boilerplate. After a few sessions you will recognise
push ebp / mov ebp, esp / sub esp, Nin under a second and jump straight to the logic that follows. - CFG loops are always the same shape. A back-edge in the control-flow graph is a loop — full stop. Train your eye on the graph view and you stop reading instructions linearly and start reading structure.
- XOR decryptors look identical everywhere. Load byte, XOR, store byte, increment counter, compare to length, branch back. Once you recognise the shape you will spot it in any binary within seconds.
- The PEB walk is copy-pasted across malware families.
FS:[0x30](x86) orGS:[0x60](x64) followed by three or four chained dereferences is the same code in hundreds of samples.
Recommended Learning Path
| Stage | Focus | Suggested exercise |
|---|---|---|
| 1 — Foundation | x86 registers, the stack, MOV, PUSH/POP, CALL/RET |
Hand-trace a cdecl “hello world” step-by-step in Ghidra’s listing view |
| 2 — Control flow | CMP, TEST, Jcc, loops, switch-jump tables |
Find a counted loop in any open-source binary; label the counter, body, and exit |
| 3 — Conventions | cdecl vs stdcall vs x64 ABI; argument location rules | Identify argument-passing in five Win32 API calls (CreateFile, VirtualAlloc, etc.) |
| 4 — Patterns | XOR decryptors, PEB walks, anti-debug idioms | Analyse a CTF reversing challenge from pwn.college or crackmes.one |
| 5 — x64 | Shadow space, RIP-relative addressing, R8–R15 | Repeat stages 1–4 on a 64-bit Windows binary |
| 6 — ARM | RISC philosophy, conditional execution suffix, Thumb | Analyse a simple Android .so from an open-source APK |
Tools to Have Ready Before You Open a Sample
| Tool | Purpose | Free? |
|---|---|---|
| Ghidra (NSA) | Full disassembler + decompiler; the best free starting point | Yes |
| Binary Ninja | Fast UI, excellent MLIL/HLIL layers, great scripting API | Trial / paid |
| x64dbg | Dynamic debugger for Windows x86/x64; pairs with Ghidra for static+dynamic | Yes |
| PE-bear | PE header inspector — understand the binary’s imports and sections before loading it | Yes |
| CFF Explorer | Import table, overlay, and resource inspector | Yes |
| FLOSS (Mandiant) | Extracts obfuscated and stack-built strings without executing the binary | Yes |
| Detect-It-Easy | Packer and compiler fingerprinting — tells you what unpacking you need first | Yes |
Beginner trap to avoid: Do not start dynamic analysis (running the sample in a debugger) before you have done at least a pass of static analysis (Ghidra/Binary Ninja). Dynamic analysis is powerful but dangerous — malware can detect the debugger and feed you a decoy execution path. Static first, dynamic second.
VSCode Setup for Assembly Practice
Reading assembly in a disassembler is one skill; writing it to build intuition is another. VSCode with NASM gives you a lightweight environment to experiment with snippets without spinning up a full VM.
Essential Extensions
Install these four extensions from the VSCode Marketplace (Ctrl+Shift+X):
| Extension ID | What it does |
|---|---|
13xforever.language-x86-64-assembly |
Syntax highlighting for x86/x64 NASM, MASM, GAS, and AT&T syntax |
OrangeX4.vscode-masm-run |
Adds run/build buttons for MASM/NASM files directly in the editor |
usernamehw.errorlens |
Inline error display — useful when nasm outputs errors with line numbers |
streetsidesoftware.code-spell-checker |
Optional but saves you from typo-driven bugs in label names |
Install all four in one shot from the terminal:
code --install-extension 13xforever.language-x86-64-assembly
code --install-extension OrangeX4.vscode-masm-run
code --install-extension usernamehw.errorlens
code --install-extension streetsidesoftware.code-spell-checker
Installing NASM
Windows:
- Download the NASM installer from nasm.us — pick the latest
win64.exe - Run the installer; tick “Add to PATH”
- Verify in a new terminal:
nasm --version
You also need a linker. The easiest option on Windows is to install the free GoLink linker or use the MinGW ld that ships with Git for Windows:
# Check both are on PATH
nasm --version # e.g. NASM version 2.16.x
ld --version # GNU ld (part of MinGW / binutils)
Linux / WSL:
sudo apt install nasm build-essential # Debian / Ubuntu
sudo dnf install nasm gcc # Fedora / RHEL
Your First Assembly File
Create hello.asm and paste this x64 Linux snippet (works in WSL):
; hello.asm — x64 Linux, NASM syntax
; Assemble: nasm -f elf64 hello.asm && ld -o hello hello.o && ./hello
section .data
msg db "hello, asm", 10 ; 10 = newline
len equ $ - msg
section .text
global _start
_start:
mov rax, 1 ; syscall: write
mov rdi, 1 ; fd: stdout
mov rsi, msg ; buffer address
mov rdx, len ; byte count
syscall
mov rax, 60 ; syscall: exit
xor rdi, rdi ; status: 0
syscall
For Windows (x64 MASM-style with the Windows API), create hello_win.asm:
; hello_win.asm — x64 Windows, NASM syntax, links against kernel32
; Assemble+link:
; nasm -f win64 hello_win.asm -o hello_win.obj
; link /subsystem:console /entry:main hello_win.obj kernel32.lib
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .data
msg db "hello, asm", 13, 10
msglen equ $ - msg
written dq 0
section .text
global main
main:
sub rsp, 40 ; shadow space + alignment
mov rcx, -11 ; STD_OUTPUT_HANDLE
call GetStdHandle
mov rcx, rax ; hConsole
lea rdx, [rel msg] ; lpBuffer
mov r8d, msglen ; nNumberOfCharsToWrite
lea r9, [rel written] ; lpNumberOfCharsWritten
push 0 ; lpReserved (5th arg on stack)
call WriteConsoleA
xor rcx, rcx
call ExitProcess
Tip for analysts: The Windows snippet demonstrates the x64 Microsoft ABI in action — shadow space, register arguments in RCX/RDX/R8/R9, and a stack-passed fifth argument. It is more instructive than the Linux version if your target is Windows malware.
Build Task (tasks.json)
Create .vscode/tasks.json in your project root so Ctrl+Shift+B assembles and links automatically:
{
"version": "2.0.0",
"tasks": [
{
"label": "NASM — build (Linux/WSL elf64)",
"type": "shell",
"command": "nasm -f elf64 ${file} -o ${fileDirname}/${fileBasenameNoExtension}.o && ld -o ${fileDirname}/${fileBasenameNoExtension} ${fileDirname}/${fileBasenameNoExtension}.o",
"group": { "kind": "build", "isDefault": true },
"presentation": { "reveal": "always", "panel": "shared" },
"problemMatcher": {
"owner": "nasm",
"fileLocation": ["absolute"],
"pattern": {
"regexp": "^(.+):(\\d+):\\s+(.+)$",
"file": 1, "line": 2, "message": 3
}
}
},
{
"label": "Run assembled binary",
"type": "shell",
"command": "${fileDirname}/${fileBasenameNoExtension}",
"group": "test",
"dependsOn": "NASM — build (Linux/WSL elf64)",
"presentation": { "reveal": "always", "panel": "shared" }
}
]
}
After saving, press Ctrl+Shift+B while any .asm file is active to assemble it. NASM errors appear inline in the editor via ErrorLens.
Debugging with x64dbg
x64dbg is the go-to Windows debugger for malware analysis and also the best way to step through your hand-written assembly:
- Download x64dbg and extract it — no install needed
- Right-click the
.exeyour NASM build produced → Open with x64dbg - The binary breaks at the entry point automatically (
_start/main) - Use
F7(step into) andF8(step over) to trace execution - Watch the Registers panel on the right — every instruction updates it live
Workflow for learning: Write a small snippet in VSCode, build it, open the output in x64dbg, and step through it. Watching RSP change on every PUSH/POP and seeing RAX set to your expected value after a calculation is the fastest way to build register intuition.
VSCode + x64dbg shortcut: Add an x64dbg open task to
tasks.jsonso pressing a keybinding launches the debugger directly on the built binary, saving the manual drag-and-drop step.
Assembly 101 — How to Read Assembler Code
Before drilling into registers and instructions, you need to parse the notation. This section teaches you to decode any line the disassembler shows you.
Anatomy of One Line
Every assembly line has up to four parts:
[label:] mnemonic [operand1[, operand2[, operand3]]] [; comment]
| Part | Optional? | Example | Meaning |
|---|---|---|---|
| Label | Yes | loop_start: |
Named address — targets for jumps and calls |
| Mnemonic | No | MOV |
The operation the CPU performs |
| Operands | Most mnemonics need 1–2 | EAX, 5 |
What the operation acts on |
| Comment | Yes | ; i = 0 |
Human annotation, ignored by assembler |
xor_loop: MOV EAX, [ESI + ECX] ; load byte from buffer
; ↑ label ↑mnem ↑dst ↑src ↑ comment
Intel syntax rule (used by Ghidra, Binary Ninja, and NASM):
Destination is always the left operand.
MOV EAX, 5 means “put 5 into EAX”, not “put EAX into 5”. Every instruction follows this convention: left = where the result lands, right = the source.
Intel vs AT&T Syntax
You will encounter both in the wild. Ghidra and Binary Ninja default to Intel; GDB and older GNU tools default to AT&T.
| Feature | Intel (NASM / MASM) | AT&T (GAS / GDB) |
|---|---|---|
| Operand order | dst, src |
src, dst — reversed |
| Register names | EAX |
%eax — prefixed with % |
| Immediates | 5 |
$5 — prefixed with $ |
| Memory reference | [EAX] |
(%eax) — uses parentheses |
| Size suffix | DWORD PTR [EAX] |
movl (%eax) — letter suffix on mnemonic (b=byte, w=word, l=long/dword, q=qword) |
| Example | mov eax, [ebx + 8] |
movl 8(%ebx), %eax |
If you see % before register names and $ before numbers, you are reading AT&T — flip the operand order mentally.
Practical tip: You can tell Ghidra to switch between syntaxes via Edit → Tool Options → Listing Fields → Operands → “Language”. Most analysts stay on Intel.
Parsing a Memory Reference
Square brackets in Intel syntax mean “dereference this address” — the same as *ptr in C.
[ base + index * scale + displacement ]
| Component | What it is | Example |
|---|---|---|
| base | A register holding the start address | EBX |
| index | An optional register acting as offset | ECX |
| scale | Multiplier for index: 1, 2, 4, or 8 | 4 (size of int) |
| displacement | A constant byte offset | 8 |
Decode each piece of [EBX + ECX*4 + 8] in English:
EBX → base address (start of an array)
ECX * 4 → index × sizeof(int) — the Nth element
+ 8 → skip 8 bytes past the start (e.g., past a struct header)
Result → array[N].field where field is at offset 8
Common patterns you will see constantly:
[EBP - 4] ; local variable #1 (4 bytes below frame pointer)
[EBP + 8] ; first function argument (cdecl / stdcall)
[EAX] ; *(ptr) — simple dereference
[EAX + 0x3C] ; ptr->field_at_offset_0x3C (e.g. PE header offset)
[EAX + ECX] ; ptr[i] — byte array element
[EAX + ECX*4] ; ptr[i] — int array element (4 bytes each)
Reading a Sequence — Building Mental State
Assembly has no scope, no types, no variable names. Reading it means running a tiny virtual machine in your head. For every line, ask three questions:
- Which registers change? — only the destination operand is written
- Which flags change? — arithmetic and compare instructions update flags;
MOVandLEAdo not - Does memory get read or written? — any operand in
[ ]touches memory
Work through a sequence by tracking register values as a table:
; Trace these five instructions top-to-bottom
mov eax, 10 ; 1
mov ecx, 3 ; 2
mul ecx ; 3 — EDX:EAX = EAX * ECX
sub eax, 2 ; 4
push eax ; 5
| Step | Instruction | EAX | ECX | EDX | ESP | Memory |
|---|---|---|---|---|---|---|
| start | — | ? | ? | ? | 0xFF | — |
| 1 | mov eax, 10 |
10 | ? | ? | 0xFF | — |
| 2 | mov ecx, 3 |
10 | 3 | ? | 0xFF | — |
| 3 | mul ecx |
30 | 3 | 0 | 0xFF | — |
| 4 | sub eax, 2 |
28 | 3 | 0 | 0xFF | — |
| 5 | push eax |
28 | 3 | 0 | 0xFB | [0xFB] = 28 |
The table discipline forces you to track exactly what each instruction does without skipping ahead — the most common beginner mistake.
Worked Example — Trace Five Lines
Here is a real-world snippet from a malware loader. Read it cold, then check the annotations:
00401020 mov eax, [ebp + 8] ; (1)
00401023 test eax, eax ; (2)
00401025 jz 00401040 ; (3)
00401027 mov ecx, [eax + 0x3C] ; (4)
0040102A add ecx, eax ; (5)
Line by line:
| # | Instruction | What it does | Mental note |
|---|---|---|---|
| 1 | mov eax, [ebp+8] |
Load the first argument into EAX | EAX = arg1 (likely a pointer) |
| 2 | test eax, eax |
AND EAX with itself — sets ZF if EAX is zero, no write | null-check on the pointer |
| 3 | jz 00401040 |
Jump to 0x401040 if ZF=1 (EAX was zero) | if (arg1 == NULL) goto error |
| 4 | mov ecx, [eax + 0x3C] |
Read a DWORD 60 bytes into the struct EAX points at | 0x3C is the e_lfanew field of a DOS header — this is reading the PE offset |
| 5 | add ecx, eax |
ECX = ECX + EAX (base + offset) | ECX now points to the PE signature / IMAGE_NT_HEADERS |
The five lines implement IMAGE_NT_HEADERS *nt = (IMAGE_NT_HEADERS*)(base + base->e_lfanew) — a pattern found in virtually every PE parser and loader you will encounter in malware analysis.
Key takeaway: You do not need to know every instruction before you start reading. You need the three questions (what changes, what flags, what memory?) and the habit of building the register table as you go. The patterns — null checks, struct field access, PE walking — repeat endlessly once you recognise them the first time.
CPU Registers — The Fast Lane
Registers are the CPU’s own ultra-fast memory — typically 8 to 32 slots, each holding one word of data. Every computation happens in registers; RAM is just slow storage the CPU ferries values to and from.
x86 General-Purpose Registers
On x86 (32-bit), eight general-purpose registers each hold a 32-bit (4-byte) value. Each register also exposes sub-word aliases that address smaller portions without extra instructions:
| Full (32-bit) | Low 16-bit | High byte (bits 8–15) | Low byte (bits 0–7) | Primary convention |
|---|---|---|---|---|
| EAX | AX | AH | AL | Return value; arithmetic accumulator |
| EBX | BX | BH | BL | Base pointer; callee-saved |
| ECX | CX | CH | CL | Loop counter; LOOP, REP, SHIFT |
| EDX | DX | DH | DL | Extended return (EDX:EAX); I/O port |
| ESI | SI | — | — | Source index for string ops |
| EDI | DI | — | — | Destination index for string ops |
| ESP | SP | — | — | Stack pointer — always points to TOS |
| EBP | BP | — | — | Frame pointer — anchors local variable base |
Ghidra / Binary Ninja tip: When you see
[EBP - 0x8], that is a local variable 8 bytes below the frame pointer. When you see[EBP + 0x8], that is the first function argument (cdecl convention).
x64 Extensions
x64 extends every 32-bit register to 64 bits and adds eight new registers. The naming convention prefixes R for the full 64-bit form:
| x64 (64-bit) | x86 alias (low 32) | Low 16 | Low 8 | Convention |
|---|---|---|---|---|
| RAX | EAX | AX | AL | Return value |
| RBX | EBX | BX | BL | Callee-saved |
| RCX | ECX | CX | CL | Arg 1 (Windows) |
| RDX | EDX | DX | DL | Arg 2 (Windows) |
| RSI | ESI | SI | SIL | Arg 2 (Linux); callee-saved (Windows) |
| RDI | EDI | DI | DIL | Arg 1 (Linux); callee-saved (Windows) |
| RSP | ESP | SP | SPL | Stack pointer |
| RBP | EBP | BP | BPL | Frame pointer (optional in x64) |
| R8–R11 | R8D–R11D | R8W–R11W | R8B–R11B | Arg 3–4 (Windows/Linux); caller-saved |
| R12–R15 | R12D–R15D | R12W–R15W | R12B–R15B | Callee-saved |
Critical x64 gotcha: Writing to a 32-bit sub-register (e.g. EAX) zero-extends into the 64-bit register (RAX). Writing to a 16-bit or 8-bit sub-register does not. This catches many analysts off-guard when reading decompiler output.
mov eax, 1 ; RAX = 0x0000000000000001 (upper 32 bits zeroed!)
mov ax, 1 ; RAX unchanged except low 16 bits
mov al, 1 ; RAX unchanged except low 8 bits
ARM / AArch64 Registers
ARM uses a load-store architecture: unlike x86, arithmetic instructions can only operate on registers, never directly on memory. Data must be explicitly loaded into a register first.
ARM (32-bit) registers:
| Register | Alias | Role |
|---|---|---|
| R0–R3 | — | Function arguments 1–4; return value in R0 |
| R4–R11 | — | General purpose; callee-saved |
| R12 | IP | Intra-procedure-call scratch register |
| R13 | SP | Stack pointer |
| R14 | LR | Link Register — holds return address |
| R15 | PC | Program Counter — current instruction address |
| — | CPSR | Current Program Status Register (flags) |
AArch64 (64-bit) registers:
| Register | Width | Role |
|---|---|---|
| X0–X7 | 64-bit | Function arguments 1–8; return in X0 |
| X8 | 64-bit | Indirect result location / syscall number (Linux) |
| X9–X15 | 64-bit | Caller-saved temporaries |
| X16–X17 | 64-bit | Intra-procedure-call scratch |
| X18 | 64-bit | Platform reserved (TEB on Windows ARM64) |
| X19–X28 | 64-bit | Callee-saved |
| X29 | 64-bit | Frame pointer (FP) |
| X30 | 64-bit | Link register (LR) |
| SP | 64-bit | Stack pointer (not a general register) |
| PC | 64-bit | Program counter (not directly writeable) |
| — | 32-bit each | W0–W30 — 32-bit aliases of X registers |
The Instruction Pointer
The instruction pointer is the CPU’s “current position” register:
| Architecture | Register | Notes |
|---|---|---|
| x86 | EIP | Cannot be read directly; modified by JMP, CALL, RET |
| x64 | RIP | Readable indirectly via CALL $+5; POP RAX; used for RIP-relative addressing |
| ARM32 | PC (R15) | Readable and writable — writing to PC is a branch |
| AArch64 | PC | Not directly writeable; only modified by branch instructions |
Reversing tip: In x64 binaries, you will constantly see patterns like
lea rax, [rip + 0x1234]. This is RIP-relative addressing — the operand is relative to the next instruction’s address. Ghidra and Binary Ninja both resolve these to absolute addresses automatically.
EFLAGS / RFLAGS — The Status Word
Every comparison and arithmetic operation updates individual bits in the flags register. Conditional jumps then branch based on these bits.
| Flag | Bit | Set when… | Common instructions that set it | Jump / branch that reads it |
|---|---|---|---|---|
| CF | 0 | Carry/borrow out of the MSB (unsigned overflow) | ADD, SUB, SHL/SHR, CLC/STC, MUL |
JB/JNAE (CF=1 → unsigned below); JAE/JNB (CF=0 → unsigned above-or-equal) |
| PF | 2 | Low byte of result has even parity | Most arithmetic and logic ops | JP/JPE (PF=1); JNP/JPO (PF=0) — rare in modern code; seen in CRC loops |
| AF | 4 | Carry from bit 3 to bit 4 (BCD arithmetic) | ADD, SUB, INC, DEC |
Not tested by Jcc; consumed by DAA/DAS — almost never seen outside legacy x86 |
| ZF | 6 | Result is zero | CMP, TEST, AND, OR, XOR, ADD, SUB, INC, DEC |
JE/JZ (ZF=1 → equal/zero); JNE/JNZ (ZF=0 → not equal) — the most-used flag in disassembly |
| SF | 7 | Result is negative (sign bit is 1) | Most arithmetic and logic ops | JS (SF=1); JNS (SF=0); combined with OF for JL/JG |
| OF | 11 | Signed overflow — result too large for the signed type | ADD, SUB, IMUL, NEG, INC, DEC |
JO (OF=1); JNO (OF=0); paired with SF for JL (SF≠OF) and JGE (SF=OF) |
| DF | 10 | Direction for string ops (0 = forward / increment, 1 = backward / decrement) | CLD clears it; STD sets it |
Not a Jcc flag — implicitly consumed by REP MOVS, REP STOS, SCAS. Malware sets DF=1 before REP STOSD to wipe memory backwards |
| IF | 9 | Interrupts enabled | STI sets; CLI clears |
Not testable from user mode — kernel/driver context only |
Analyst tip — ZF is king: In practice,
ZFis the flag you will track most often.TEST EAX, EAX/JNZis the universal “is this value non-null?” idiom.CMP EAX, 0/JEis “did this function return 0 (error/false)?”. If you can only track one flag, track ZF.
The ARM equivalent is the CPSR (Current Program Status Register) / NZCV flags in AArch64:
| ARM Flag | x86 Equivalent | Meaning |
|---|---|---|
| N | SF | Negative result |
| Z | ZF | Zero result |
| C | CF | Carry |
| V | OF | Overflow |
Memory and Addressing Modes
The Memory Map
A typical 32-bit Windows user-mode process looks like this:
On 64-bit Windows the user-mode range extends to 0x00007FFFFFFFFFFF. The structure is the same but the addresses are much larger. The kernel occupies the upper half of the virtual address space.
Addressing Mode Syntax
Intel syntax (used by Ghidra and Binary Ninja by default) wraps memory references in square brackets:
; Direct (absolute address)
mov eax, [0x402000] ; load 4 bytes from address 0x402000
; Register indirect
mov eax, [ebx] ; load 4 bytes from address stored in EBX
; Base + displacement
mov eax, [ebp + 8] ; first argument in a cdecl frame
mov eax, [ebp - 4] ; first local variable
; Base + Index * Scale + Displacement (SIB byte)
mov eax, [ebx + ecx*4 + 8] ; array element: base + index*sizeof(int) + offset
ARM uses different syntax but the concept is identical:
; ARM32 — load/store
LDR R0, [R1] ; R0 = *(R1)
LDR R0, [R1, #8] ; R0 = *(R1 + 8)
LDR R0, [R1, R2] ; R0 = *(R1 + R2)
LDR R0, [R1, R2, LSL #2] ; R0 = *(R1 + R2<<2) — array index
STR R0, [R1] ; *(R1) = R0
STMFD SP!, {R4-R7, LR} ; push multiple registers onto stack (PUSH equivalent)
LDMFD SP!, {R4-R7, PC} ; pop and branch to LR — the ARM function return idiom
The Stack — How Functions Think
The stack grows downward on all common architectures: pushing a value decrements the stack pointer and writes the value at the new address.
x86 Stack Layout
Key rules:
- EBP is the stable anchor — it does not move during a function call. All local variables and arguments are addressed relative to it.
- ESP moves freely as values are pushed/popped. Compilers often omit EBP in optimized code (frame-pointer omission /
-fomit-frame-pointer) and use ESP-relative addressing instead.
Function Prologue and Epilogue
Every function you see in a disassembler begins and ends with boilerplate code to set up and tear down the stack frame.
x86 standard prologue:
push ebp ; save caller's frame pointer
mov ebp, esp ; establish new frame pointer
sub esp, 0x28 ; reserve 0x28 (40) bytes for local variables
push ebx ; callee-saved registers that this function uses
push esi
push edi
x86 standard epilogue:
pop edi ; restore callee-saved registers (reverse order)
pop esi
pop ebx
mov esp, ebp ; collapse stack frame
pop ebp ; restore caller's frame pointer
ret ; pop return address into EIP
The leave instruction is shorthand for mov esp, ebp; pop ebp. You will see it often in GCC output:
leave ; equivalent: mov esp, ebp; pop ebp
ret
x64 prologue (Windows):
push rbp
mov rbp, rsp
sub rsp, 0x40 ; shadow space (0x20) + locals
push rbx ; callee-saved registers
push r12
push r13
push r14
In x64, many compilers omit the frame pointer entirely and address locals relative to RSP:
sub rsp, 0x58 ; allocate stack space for locals + shadow space
; locals at [rsp+0], [rsp+8], etc.
add rsp, 0x58 ; epilogue: collapse frame
ret
ARM32 prologue/epilogue:
; Prologue — push callee-saved regs and LR onto stack
PUSH {R4, R5, R6, R7, LR}
SUB SP, SP, #0x10 ; allocate 16 bytes for locals
; Epilogue — restore and return (loading LR into PC branches back)
ADD SP, SP, #0x10
POP {R4, R5, R6, R7, PC}
Writing PC from a pop is ARM’s atomic “restore and return” — it simultaneously restores registers and jumps to the saved LR value.
Core Instructions in Depth
Data Movement
| Instruction | Example | Effect |
|---|---|---|
| MOV | mov eax, 5 |
EAX ← 5 |
| MOV | mov eax, [ebx] |
EAX ← memory at EBX |
| MOV | mov [eax], ebx |
memory at EAX ← EBX |
| LEA | lea eax, [ebx+4] |
EAX ← address EBX+4 (no memory read) |
| MOVZX | movzx eax, byte [ebx] |
Load byte, zero-extend to 32 bits |
| MOVSX | movsx eax, byte [ebx] |
Load byte, sign-extend to 32 bits |
| XCHG | xchg eax, ebx |
Swap EAX ↔ EBX (atomic with LOCK prefix) |
| PUSH | push eax |
ESP -= 4; [ESP] ← EAX |
| POP | pop eax |
EAX ← [ESP]; ESP += 4 |
LEA trick: Compilers routinely abuse LEA for fast arithmetic.
lea eax, [eax + eax*4]computesEAX * 5without a multiply instruction. When you see LEA with no obvious pointer, think “fast multiply or multi-operand add.”
ARM equivalents:
; ARM32
MOV R0, #5 ; R0 = 5 (immediate)
MOV R0, R1 ; R0 = R1
LDR R0, [R1] ; R0 = *(R1) — equivalent to x86 MOV reg, [reg]
STR R0, [R1] ; *(R1) = R0 — equivalent to x86 MOV [reg], reg
LDRB R0, [R1] ; load byte (zero-extended)
LDRSB R0, [R1] ; load byte (sign-extended)
ADR R0, label ; R0 = address of label (LEA equivalent)
Arithmetic
add eax, 5 ; EAX += 5
sub eax, ebx ; EAX -= EBX
imul eax, ecx, 7 ; EAX = ECX * 7 (signed multiply, 3-operand form)
mul ecx ; EDX:EAX = EAX * ECX (unsigned; high bits in EDX!)
idiv ecx ; EAX = EAX/ECX quotient; EDX = remainder (signed)
inc eax ; EAX++ (does NOT set CF — common gotcha)
dec eax ; EAX--
neg eax ; EAX = -EAX (two's complement negation)
Malware pattern — mul for obfuscation: Malware authors sometimes use MUL or IMUL with unusual constants as a cheap hash function or address offset calculation. If you see a multiply followed by an add and then a memory dereference, you are likely looking at a hash-table lookup.
ARM32 arithmetic:
ADD R0, R1, R2 ; R0 = R1 + R2
ADD R0, R0, #4 ; R0 += 4
SUB R0, R1, R2 ; R0 = R1 - R2
MUL R0, R1, R2 ; R0 = R1 * R2 (low 32 bits)
UMULL R0, R1, R2, R3 ; R1:R0 = R2 * R3 (64-bit unsigned result)
RSB R0, R1, #0 ; R0 = 0 - R1 (negate; ARM has no NEG instruction)
Bitwise & Shift
and eax, 0xFF ; mask — keep only low byte
or eax, 0x04 ; set bit 2
xor eax, eax ; EAX = 0 (fastest zero idiom; also clears CF/OF)
xor eax, key ; encrypt/decrypt byte with key (most common malware op)
not eax ; bitwise complement
shl eax, 3 ; logical shift left 3 ≡ multiply by 8
shr eax, 1 ; logical shift right 1 ≡ unsigned divide by 2
sar eax, 1 ; arithmetic shift right (preserves sign bit)
rol eax, 4 ; rotate left 4 bits (used in hash functions / crypto)
ror eax, 4 ; rotate right 4 bits
bswap eax ; reverse byte order (endian swap)
xor reg, regis the canonical “zero a register” idiom. It generates a 2-byte encoding versus the 5-bytemov eax, 0. You will see it at the start of almost every function to zero out return value or loop counter.
ARM32 bitwise:
AND R0, R1, R2 ; R0 = R1 & R2
ORR R0, R1, R2 ; R0 = R1 | R2 (note: ORR not OR)
EOR R0, R1, R2 ; R0 = R1 ^ R2 (XOR)
MVN R0, R1 ; R0 = ~R1 (NOT + move)
LSL R0, R1, #3 ; R0 = R1 << 3
LSR R0, R1, #1 ; R0 = R1 >> 1 (logical)
ASR R0, R1, #1 ; R0 = R1 >> 1 (arithmetic)
ROR R0, R1, #4 ; R0 = rotate_right(R1, 4)
; ARM's barrel shifter lets you combine shift with any data op:
ADD R0, R1, R2, LSL #2 ; R0 = R1 + (R2 << 2) — all in one instruction!
Comparison and Flags
CMP subtracts two values and discards the result — only flags are updated. TEST ANDs two values and discards the result. Neither instruction writes a register.
cmp eax, 0 ; sets ZF if EAX==0, SF if EAX<0
test eax, eax ; exactly like `cmp eax, 0` but 1 byte shorter
cmp eax, ebx
jl less_label ; signed: jump if EAX < EBX (SF != OF)
jb below_label ; unsigned: jump if EAX < EBX (CF=1)
test eax, 0x01 ; test bit 0
jnz odd_label ; jump if bit 0 was set
ARM32 comparisons:
CMP R0, R1 ; flags = R0 - R1 (discards result)
TST R0, #0x01 ; flags = R0 & 0x01
CMN R0, R1 ; flags = R0 + R1 (compare negative)
; ARM conditionals are unique: any instruction can be conditional!
MOVEQ R0, #1 ; R0 = 1 ONLY if Z flag is set (x86 needs a Jcc)
ADDNE R2, R2, #4 ; R2 += 4 ONLY if Z flag is clear
This conditional execution is a key ARM differentiator — instead of a cmp + jcc + branch target, a short if-else can be two unconditional + two conditional instructions with no branch at all.
Control Flow
; Unconditional
jmp label ; EIP = label
call label ; push EIP; EIP = label
ret ; EIP = [ESP]; ESP += 4
ret 8 ; EIP = [ESP]; ESP += 12 (stdcall — also pops 2 dwords of args)
; Conditional jumps (check after CMP/TEST)
je / jz label ; jump if ZF=1 (equal / zero)
jne / jnz label ; jump if ZF=0 (not equal / not zero)
jl / jnge label ; signed less than (SF!=OF)
jle / jng label ; signed less-than-or-equal (ZF=1 or SF!=OF)
jg / jnle label ; signed greater than (ZF=0 and SF=OF)
jge / jnl label ; signed greater-than-or-equal (SF=OF)
jb / jnae label ; unsigned below (CF=1)
ja / jnbe label ; unsigned above (CF=0 and ZF=0)
; Loop
loop label ; ECX--; jump if ECX!=0
loope label ; ECX--; jump if ECX!=0 AND ZF=1
ARM32 branches:
B label ; unconditional branch (x86 JMP)
BL label ; branch with link — saves PC+4 into LR (x86 CALL)
BX LR ; branch to address in LR — function return (x86 RET)
BLX R0 ; branch-with-link to address in R0 — indirect call
; Conditional branches
BEQ label ; branch if Z=1
BNE label ; branch if Z=0
BLT label ; branch if N!=V (signed less than)
BGT label ; branch if Z=0 and N=V (signed greater than)
BLO label ; branch if C=0 (unsigned lower)
BHI label ; branch if C=1 and Z=0 (unsigned higher)
String Operations
x86 has a family of bulk-memory instructions that operate on ESI/EDI and auto-increment/decrement them based on the DF flag. Combined with the REP prefix they form efficient memory loops.
cld ; clear DF — direction = forward (ESI/EDI increment)
std ; set DF — direction = backward (decrement)
rep movsb ; copy ECX bytes from [ESI] to [EDI]
rep stosd ; fill ECX dwords at [EDI] with EAX (memset-like)
rep cmpsb ; compare ECX bytes at [ESI] vs [EDI] (memcmp-like)
repe scasb ; scan EDI for byte in AL; ECX counts down
; Common Ghidra/BN patterns for these:
; rep movsd -> memmove(edi, esi, ecx*4)
; rep stosd -> memset(edi, eax, ecx*4) (EAX is usually 0 = bzero)
Shellcode pattern: REP MOVSD/STOSD shows up in PE loaders embedded in shellcode — copying sections into allocated memory or zeroing the BSS.
Calling Conventions
Calling conventions define: where arguments go, who cleans the stack, and which registers must be preserved across a call.
x86 cdecl and stdcall
cdecl stdcall
───────────────────── ──────────────────────
args pushed right to left pushed right to left
cleanup CALLER cleans stack CALLEE cleans (RET n)
return EAX (small values) EAX
EDX:EAX (64-bit) EDX:EAX (64-bit)
saved EBX, ESI, EDI, EBP EBX, ESI, EDI, EBP
Spotting cdecl vs stdcall in a disassembler:
- cdecl: after
CALL, you seeadd esp, N— the caller cleaning up N bytes of arguments - stdcall: the
CALLtarget ends withRET N— callee cleans its own arguments
; cdecl call: add(3, 7)
push 7
push 3
call _add
add esp, 8 ; caller pops 2 x 4-byte args
; stdcall call: MessageBoxA(NULL, "hi", "cap", 0)
push 0
push offset caption
push offset text
push 0
call MessageBoxA ; MessageBoxA does: ret 0x10 (cleans 16 bytes itself)
x64 Microsoft ABI
On 64-bit Windows, the first four integer/pointer arguments go in registers. There is no stack cleanup by the caller.
Arg 1 -> RCX (or XMM0 if float)
Arg 2 -> RDX (or XMM1)
Arg 3 -> R8 (or XMM2)
Arg 4 -> R9 (or XMM3)
Arg 5+ -> stack (above shadow space)
The shadow space (also called “home space”) is 32 bytes (4 x 8) that the caller must always allocate on the stack before a call, even if the function takes fewer than 4 arguments. The callee may spill its register arguments into this space.
; x64 call: CreateFileA(name, GENERIC_READ, ...)
sub rsp, 0x28 ; shadow space (0x20) + alignment
mov rcx, rax ; arg1 = filename
mov edx, 0x80000000 ; arg2 = GENERIC_READ
xor r8d, r8d ; arg3 = 0 (share mode)
xor r9d, r9d ; arg4 = NULL (security attrs)
; arg5-arg7 go on stack at [rsp+0x20], [rsp+0x28], [rsp+0x30]
mov dword [rsp+0x20], 3 ; arg5 = OPEN_EXISTING
mov dword [rsp+0x28], 0 ; arg6 = FILE_ATTRIBUTE_NORMAL
mov qword [rsp+0x30], 0 ; arg7 = NULL
call CreateFileA
add rsp, 0x28
x86 fastcall and thiscall
Two more conventions appear constantly in Windows binaries — especially those compiled with MSVC.
__fastcall passes the first two integer arguments in ECX and EDX (skipping the stack for them), with the rest pushed right-to-left. The callee cleans the stack.
; __fastcall: myfunc(3, 7, 99)
mov ecx, 3 ; arg1 → ECX
mov edx, 7 ; arg2 → EDX
push 99 ; arg3 on stack (right-to-left)
call myfunc ; callee does: ret 4 (cleans only arg3)
Recognition tip: if you see MOV ECX, value and MOV EDX, value before a CALL and there is no ADD ESP, N after it, you are likely in __fastcall.
__thiscall is MSVC’s calling convention for C++ member functions. The hidden this pointer goes in ECX; remaining arguments are pushed right-to-left; the callee cleans.
; C++: obj->method(42)
mov ecx, obj_ptr ; ECX = this ← the telltale sign
push 42 ; first explicit arg
call MyClass_method ; callee does: ret 4
C++ recognition shortcut: When you see
MOV ECX, [some_ptr]immediately before aCALL, you are almost certainly looking at a C++ virtual or non-virtual method call. If the call isCALL [ECX]orCALL [ECX + N], it is a vtable dispatch — follow the pointer to find the virtual function table.
x64 System V (Linux)
Linux and macOS use a different ABI:
Arg 1 -> RDI
Arg 2 -> RSI
Arg 3 -> RDX
Arg 4 -> RCX
Arg 5 -> R8
Arg 6 -> R9
Arg 7+ -> stack
Callee-saved: RBX, R12-R15, RBP
No shadow space required
Syscall number -> RAX; invoke with SYSCALL instruction
Calling Convention Comparison — At a Glance
Use this table when you need to quickly identify which convention a binary uses and reconstruct the argument list from the disassembly:
| Convention | Arg 1 | Arg 2 | Arg 3 | Arg 4 | Arg 5+ | Stack cleanup | Callee must preserve | Common context |
|---|---|---|---|---|---|---|---|---|
| cdecl | stack | stack | stack | stack | stack | Caller (ADD ESP, N after CALL) |
EBX, ESI, EDI, EBP | C functions, GCC x86 default, printf-style varargs |
| stdcall | stack | stack | stack | stack | stack | Callee (RET N) |
EBX, ESI, EDI, EBP | Win32 API (WINAPI / PASCAL macros) |
| fastcall | ECX | EDX | stack | stack | stack | Callee (RET N) |
EBX, ESI, EDI, EBP | MSVC /Gr flag, Windows kernel internal functions |
| thiscall | ECX (this) |
stack | stack | stack | stack | Callee (RET N) |
EBX, ESI, EDI, EBP | MSVC C++ non-virtual & virtual methods |
| x64 Windows | RCX | RDX | R8 | R9 | stack (above shadow) | Caller | RBX, RBP, RDI, RSI, R12–R15, XMM6–XMM15 | All 64-bit Windows code; RCX = this in C++ |
| x64 System V | RDI | RSI | RDX | RCX | R8, R9, then stack | Caller | RBX, RBP, R12–R15 | Linux, macOS x64; RDI = this in C++ |
| ARM32 AAPCS | R0 | R1 | R2 | R3 | stack (right-to-left) | Caller | R4–R11, SP | Android NDK, iOS (older), embedded ARM |
| AArch64 AAPCS64 | X0 | X1 | X2 | X3 | X4–X7, then stack | Caller | X19–X28, X29 (FP), X30 (LR) | Apple Silicon, Android ARM64, Windows on Arm |
Identifying the convention from disassembly:
| Clue | Convention |
|---|---|
ADD ESP, N immediately after CALL |
cdecl — caller cleaning N bytes |
RET N inside the callee |
stdcall or thiscall — callee cleaning N bytes |
MOV ECX, ptr then CALL with no ADD ESP after |
thiscall — ECX is this |
MOV ECX, val; MOV EDX, val before a CALL |
fastcall — first two args in registers |
MOV RCX, …; MOV RDX, …; MOV R8D, … before CALL |
x64 Windows ABI |
MOV RDI, …; MOV RSI, …; MOV RDX, … before CALL |
x64 System V (Linux/macOS) |
MOV R0, …; MOV R1, …; BL func |
ARM32 AAPCS |
ARM Assembly
ARM vs x86 Philosophy
| Aspect | x86 / x64 | ARM |
|---|---|---|
| Architecture | CISC — complex, variable-length instructions | RISC — uniform 32-bit instructions (mostly) |
| Memory operands | Allowed in arithmetic: ADD EAX, [EBX] |
Never — only LDR/STR touch memory |
| Instruction size | 1–15 bytes | 4 bytes (ARM) / 2 or 4 bytes (Thumb) |
| Condition codes | Only branch instructions | Any instruction can be conditional |
| Barrel shifter | Separate shift instructions | Built-in: ADD R0, R1, R2, LSL #2 |
| Endianness | Always little-endian | Configurable (usually little-endian) |
ARM Registers Deep Dive
The ARM calling convention (AAPCS) assigns specific roles to registers that the disassembler will display without aliases. You must know them:
Saved registers (must be preserved):
R4 R5 R6 R7 R8 R9 R10 R11(FP)
Scratch / argument registers (caller-saved):
R0 R1 R2 R3
Special:
R12 = IP (intra-procedure scratch; used by PLT stubs on Linux)
R13 = SP (stack pointer — never use for anything else)
R14 = LR (link register — holds return address after BL)
R15 = PC (program counter — read is PC+8 in ARM mode, PC+4 in Thumb)
ARM PC offset gotcha: In ARM32 mode, reading PC gives the address of the current instruction +8 (not +4). This is a pipeline artifact from ARM’s 3-stage pipeline. Ghidra and Binary Ninja compensate automatically, but if you calculate addresses manually, remember the offset.
Key ARM Instructions
; ── Load / Store ─────────────────────────────────────────────
LDR R0, [R1] ; 32-bit load
LDRH R0, [R1] ; 16-bit load, zero-extend
LDRB R0, [R1] ; 8-bit load, zero-extend
LDRSB R0, [R1] ; 8-bit load, sign-extend
STR R0, [R1] ; 32-bit store
STRB R0, [R1, #3] ; byte store with offset
; Pre-indexing (update base before access)
LDR R0, [R1, #4]! ; R0 = *(R1+4); R1 += 4
; Post-indexing (update base after access)
LDR R0, [R1], #4 ; R0 = *R1; R1 += 4 -- very common in loops
; Multiple-register transfer (callee save/restore)
STMFD SP!, {R4-R11, LR} ; push R4..R11 and LR
LDMFD SP!, {R4-R11, PC} ; pop R4..R11 and jump to saved LR
; ── Branching ────────────────────────────────────────────────
B func ; jump
BL func ; call (saves PC+4 to LR)
BX LR ; return (branch to address in LR)
BLX R0 ; indirect call (also switches ARM/Thumb mode)
; ── Data Processing ──────────────────────────────────────────
MOV R0, #0xFF ; R0 = 255
MOVW R0, #0x1234 ; R0 = 0x1234 (16-bit immediate, ARMv6T2+)
MOVT R0, #0x5678 ; R0[31:16] = 0x5678 (upper 16 bits)
; Together: MOVW/MOVT pair loads a full 32-bit constant
; This is the ARM equivalent of x86 `mov eax, imm32`
MRS R0, CPSR ; read flags/mode register
MSR CPSR_f, R0 ; write flags field of CPSR
Thumb and Thumb-2 Mode
ARM processors can switch between ARM mode (4-byte instructions) and Thumb mode (2-byte instructions). This halves code size at a small performance cost — critical for embedded/mobile malware.
Detection in disassemblers:
- Thumb mode functions have their symbol address OR’d with 1 (e.g.,
0x00008001instead of0x00008000) - Ghidra and Binary Ninja auto-detect and display the right instruction set
BX Rnwith the LSB of Rn set = switch to Thumb; clear = switch to ARM
Thumb-2 (ARMv6T2 / Cortex-A) extends Thumb with 32-bit instructions, giving near-ARM performance with compact encoding. Most modern Android/iOS malware uses Thumb-2.
; Thumb (16-bit) — notice missing base register in 2-reg ops
PUSH {R4, LR} ; save
MOV R0, #5
ADD R0, R1 ; R0 += R1 (Thumb: only 2-register form)
POP {R4, PC} ; restore and return
; Thumb-2 (32-bit prefix: 0xE8xx, 0xF0xx, 0xF8xx...)
MOVW R0, #0xABCD ; 32-bit immediate in Thumb-2
MOVT R0, #0x1234
AArch64 (ARM64)
AArch64 is a complete redesign — not backward compatible with ARM32. Used in Apple Silicon, Raspberry Pi 4+, and Windows on Arm.
; Registers: X0-X30 (64-bit), W0-W30 (low 32 bits), SP, PC
; No condition codes on most instructions (unlike ARM32)
; No barrel shifter in addressing modes (separate shift instructions)
; Load / store
LDR X0, [X1] ; 64-bit load
LDR W0, [X1] ; 32-bit load (zero-extends into X0)
LDRB W0, [X1] ; byte load
STP X29, X30, [SP, #-16]! ; store pair (typical frame setup)
LDP X29, X30, [SP], #16 ; load pair (typical frame teardown)
; Arithmetic
ADD X0, X1, X2 ; X0 = X1 + X2
ADD X0, X1, #8 ; X0 = X1 + 8
MUL X0, X1, X2 ; X0 = X1 * X2 (low 64 bits)
; Branching
BL func ; call (saves PC+4 to X30/LR)
RET ; return via X30/LR (NOT ret like x86 — no stack pop)
BR X0 ; indirect branch (x86: jmp rax)
BLR X0 ; indirect call (x86: call rax)
; Conditionals (separate compare-and-branch)
CBZ X0, label ; branch if X0 == 0 (no CMP needed)
CBNZ X0, label ; branch if X0 != 0
TBZ X0, #3, label ; branch if bit 3 of X0 == 0
ARM Calling Convention (AAPCS)
ARM32 (AAPCS):
Arguments 1-4 : R0 R1 R2 R3
Arguments 5+ : stack (pushed right-to-left)
Return value : R0 (64-bit: R1:R0)
Callee-saved : R4-R11, SP
Caller-saved : R0-R3, R12, LR
Stack : 8-byte aligned at public interfaces
AArch64 (AAPCS64):
Arguments 1-8 : X0-X7
Arguments 9+ : stack
Return value : X0 (128-bit: X1:X0)
Callee-saved : X19-X28, X29(FP), X30(LR), SP
Caller-saved : X0-X18
Stack : 16-byte aligned always
Reading Disassembly in Ghidra and Binary Ninja
Function Prologue Recognition
When you open a binary in Ghidra or Binary Ninja, every function begins with a recognizable setup sequence. Train your eye to skip past it instantly:
; Classic x86 frame setup — skim past this
55 push ebp
89 E5 mov ebp, esp
83 EC 20 sub esp, 0x20
53 push ebx
56 push esi
57 push edi
; <- HERE is where the actual logic starts
In Ghidra the decompiler view (press F on a function) collapses this to nothing — you see int local_24; int local_20; as declarations. In Binary Ninja, the HLIL (High-Level IL) also hides the prologue, but the MLIL and disassembly view show it raw.
Key takeaway: The highlighted row at
00401084is where the prologue ends and the real function body begins. Everything above it is bookkeeping — train your eye to skip it instantly.
x64 prologue without frame pointer (common in MSVC /O2):
48 83 EC 58 sub rsp, 0x58 ; allocate 88 bytes
; NO push rbp — rbp may be used as a general register!
; Locals at [rsp+N], args-shadow at [rsp+0x20]-[rsp+0x38]
Recognizing Loops
Every loop in high-level code becomes one of two patterns in assembly:
Top-test loop (while / for):
loop_start:
cmp ecx, 0
je loop_end ; exit if done
; body
dec ecx
jmp loop_start
loop_end:
Bottom-test loop (do-while — optimized form):
loop_body:
; body (always executes at least once)
dec ecx
jnz loop_body ; jump back while ECX != 0
Tip: In Ghidra graph view, a loop appears as a node with a back-edge arrow pointing upward. In Binary Ninja’s graph, loops have blue arrows (conditional) creating a cycle. Any upward-pointing edge is a loop candidate.
ARM32 loop pattern:
; Classic counted loop: for (i=10; i>0; i--)
MOV R2, #10 ; counter
loop_top:
; ... body using R0, R1 ...
SUBS R2, R2, #1 ; R2 -= 1; update flags (S suffix)
BNE loop_top ; branch while R2 != 0
The S suffix on ARM instructions causes them to update NZCV flags — this is how ARM avoids a separate CMP before every branch.
Recognizing Conditionals
if / else:
; if (eax == 0) { A } else { B }
test eax, eax
jnz else_branch ; if eax != 0, skip the 'if' body
; --- true branch (A) ---
; ...
jmp end_if
else_branch:
; --- false branch (B) ---
; ...
end_if:
In Ghidra graph view: two outgoing edges from a diamond shape — one labelled T (true) and one F (false). The merge point is where both paths reconverge.
switch statement:
cmp eax, 5
ja default_case ; value > 5: fall through to default
jmp [eax*4 + jump_table] ; indirect jump through table
jump_table:
dd case_0, case_1, case_2, case_3, case_4, case_5
Ghidra recognizes jump tables and labels each case. Binary Ninja uses MLIL’s switch construct. If neither tool resolves a JMP [EAX*4 + addr], you are dealing with an obfuscated or dynamically computed jump table.
The Decompiler View
Ghidra and Binary Ninja both ship decompilers that convert disassembly to C-like pseudo-code. This is a reconstruction — the type information is guessed. Common pitfalls:
| What you see in decompiler | What it really means |
|---|---|
*(int *)(param_1 + 0x3c) |
Structure field access — the decompiler doesn’t know the struct |
uVar1 = uVar2 ^ uVar3 |
Likely XOR cipher — look at the key value |
do { ... } while (iVar1 != 0) |
A bottom-test loop (do-while) |
FUN_00401234(...) |
Unnamed function — rename it after analysis |
DAT_00403000 |
A global variable — check cross-references |
(code *)DAT_... |
Indirect function call — possible shellcode dispatch table |
Common Malware Patterns
XOR Decryption Loop
The simplest and most common obfuscation. Spot it by a loop with an XOR instruction and a byte-size memory reference:
; x86: XOR decrypt: for (i=0; i<len; i++) buf[i] ^= key[i % keylen]
xor ecx, ecx ; i = 0
xor_loop:
movzx eax, byte [esi + ecx] ; load ciphertext byte
movzx edx, byte [edi + ecx] ; load key byte
xor eax, edx ; decrypt
mov [esi + ecx], al ; store plaintext
inc ecx
cmp ecx, dword [ebp - 4] ; compare to length
jl xor_loop
; ARM32 equivalent
MOV R2, #0 ; i = 0
xor_loop:
LDRB R0, [R4, R2] ; load ciphertext byte
LDRB R1, [R5, R2] ; load key byte
EOR R0, R0, R1 ; XOR
STRB R0, [R4, R2] ; store
ADD R2, R2, #1
CMP R2, R6 ; compare to length
BLT xor_loop
In Ghidra: Look for the Decompile window showing bVar = *(byte *)(buf + i) ^ *(byte *)(key + i % keyLen) inside a for-loop. Cross-reference the buffer to see where the decrypted data is used next — that reveals the payload type.
PEB Walking — API Resolution Without Imports
Shellcode and injected payloads cannot have an import table. They must resolve Win32 API addresses at runtime by walking the Process Environment Block (PEB):
; x86 PEB walk to find kernel32.dll base address
mov eax, fs:[0x30] ; EAX = &PEB (FS segment always points to TEB)
mov eax, [eax + 0x0C] ; EAX = PEB->Ldr
mov eax, [eax + 0x14] ; EAX = Ldr->InMemoryOrderModuleList.Flink
mov eax, [eax] ; follow first entry (ntdll)
mov eax, [eax] ; follow second entry (kernel32)
mov eax, [eax - 8 + 0x10] ; EAX = kernel32 base address
On x64, the PEB is at gs:[0x60] (not fs):
; x64 PEB walk
mov rax, gs:[0x60] ; RAX = &PEB64
mov rax, [rax + 0x18] ; PEB->Ldr
mov rax, [rax + 0x20] ; Ldr->InMemoryOrderModuleList.Flink
...
In Ghidra: You will see PTR fs:0x30 highlighted in blue (a special segment reference). Binary Ninja annotates the PEB structure fields automatically with the Windows types plugin loaded.
Anti-Debugging via RDTSC
RDTSC reads the CPU’s timestamp counter (nanosecond precision) into EDX:EAX. Malware calls it twice and checks if the delta is larger than expected — debugging inflates the delay.
rdtsc ; first read: EDX:EAX = TSC
mov esi, eax ; save low 32 bits
; ... some code ...
rdtsc ; second read
sub eax, esi ; delta in EAX
cmp eax, 0x10000 ; threshold
jg debugger_detected
ARM equivalent: Uses MRC p15, 0, Rt, c9, c13, 0 (Performance Monitors Cycle Count Register, PMCCNTR) on privileged ARM cores, or CNTVCT_EL0 on AArch64.
Shellcode Stub Pattern
Position-independent shellcode must locate its own base address (since it doesn’t know where it will be injected). The classic x86 technique:
call get_eip ; push EIP of next instruction
get_eip:
pop ebx ; EBX = address of get_eip label
sub ebx, 5 ; EBX = shellcode base (account for CALL encoding)
; Now EBX + offset = address of any data/code inside the shellcode
x64 version using RIP-relative LEA:
lea rbx, [rip] ; RBX = address of NEXT instruction
; Or more commonly just use [RIP + offset] directly for data references
ARM32: PC-relative loads are the native mechanism:
LDR R0, [PC, #offset] ; load value from (PC+8) + offset
; The assembler resolves `offset` so the literal pool is addressed correctly
Quick Reference Tables
x86 / x64 Jump Instructions
| Signed | Unsigned | Condition | Flags |
|---|---|---|---|
JE / JZ |
JE / JZ |
Equal / Zero | ZF=1 |
JNE / JNZ |
JNE / JNZ |
Not equal | ZF=0 |
JL / JNGE |
JB / JNAE |
Less / Below | SF!=OF / CF=1 |
JLE / JNG |
JBE / JNA |
Less-or-equal | ZF=1 or SF!=OF |
JG / JNLE |
JA / JNBE |
Greater / Above | ZF=0 and SF=OF |
JGE / JNL |
JAE / JNB |
Greater-or-equal | SF=OF / CF=0 |
| — | JS |
Sign | SF=1 |
| — | JO |
Overflow | OF=1 |
| — | JP / JPE |
Parity even | PF=1 |
ARM Condition Codes (suffix on any instruction)
| Suffix | Meaning | Flags |
|---|---|---|
EQ |
Equal | Z=1 |
NE |
Not equal | Z=0 |
GT |
Signed greater than | Z=0 and N=V |
LT |
Signed less than | N!=V |
GE |
Signed greater-or-equal | N=V |
LE |
Signed less-or-equal | Z=1 or N!=V |
HI |
Unsigned higher | C=1 and Z=0 |
LO |
Unsigned lower | C=0 |
HS |
Unsigned higher-or-same | C=1 |
LS |
Unsigned lower-or-same | C=0 or Z=1 |
MI |
Minus / negative | N=1 |
PL |
Plus / positive | N=0 |
VS |
Overflow set | V=1 |
AL |
Always (default) | — |
Register Cheatsheet — What You See in the Disassembler
| Color in this article | x86 | x64 | ARM32 | AArch64 |
|---|---|---|---|---|
| Green — general purpose | EAX EBX ECX EDX ESI EDI | RAX RBX RCX RDX RSI RDI R8-R11 | R0–R11 | X0-X18 |
| Orange — stack/frame/PC | ESP EBP EIP | RSP RBP RIP | SP LR PC | SP LR PC FP |
| Blue — ARM-specific | — | — | R0-R15 CPSR | X0-X30 NZCV |
| Amber — flags | ZF CF SF OF PF AF DF | ZF CF SF OF PF AF DF | N Z C V | N Z C V |
Size Specifiers — Operand Widths
When a memory operand is ambiguous, the assembler requires an explicit size keyword. These appear constantly in disassembly and tell you the width of the data being read or written:
| Keyword | Bits | Bytes | x86 register aliases | Typical use in disassembly |
|---|---|---|---|---|
BYTE PTR |
8 | 1 | AL, BL, CL, DL (and R_B in x64) | Character data, byte flags, single-byte XOR: mov byte ptr [eax], 0 |
WORD PTR |
16 | 2 | AX, BX, CX, DX (and R_W in x64) | Unicode char pairs, port I/O, 16-bit struct fields: mov ax, word ptr [ebx+2] |
DWORD PTR |
32 | 4 | EAX, EBX, ECX, EDX, ESI, EDI | Local int, pointer on x86, HANDLE: cmp dword ptr [ebp-4], 0 |
QWORD PTR |
64 | 8 | RAX, RBX, RCX, … (x64 only) | Pointer on x64, LONGLONG, timestamp: mov rax, qword ptr [rsi+8] |
XMMWORD PTR |
128 | 16 | XMM0–XMM15 | SIMD data, AES-NI cipher rounds, fast memcpy: movdqu xmm0, xmmword ptr [rdi] |
YMMWORD PTR |
256 | 32 | YMM0–YMM15 | AVX bulk operations; rare in malware but present in optimised crypto libraries |
Width as a clue: A loop that accesses
BYTE PTR [ESI + ECX]is processing raw bytes — likely a string, shellcode buffer, or cipher stream. Switch toDWORD PTRand you are almost certainly processing an array of integers, pointers, or 32-bit hash values.
C ↔ Assembly Equivalents
Use this table to mentally “lift” disassembly back to high-level intent before reaching for the decompiler:
| C / C++ pattern | x86 / x64 assembly | Notes |
|---|---|---|
int x = 0; |
xor eax, eax → mov [ebp-4], eax |
XOR zero is 2 bytes; mov eax, 0 is 5 bytes — compilers always pick XOR |
if (x == 0) |
test eax, eax → je label |
TEST is shorter than cmp eax, 0; identical flag result |
if (x != 0) |
test eax, eax → jnz label |
The “is this pointer null?” pattern |
if (x < 0) |
test eax, eax → js label |
Testing sign flag directly; no CMP needed |
x++ |
inc eax |
INC does not set CF — subtle source of bugs in flag-dependent code |
x-- |
dec eax |
Same: DEC does not set CF |
x += n |
add eax, n |
|
x -= n |
sub eax, n |
|
x *= 2 |
shl eax, 1 or add eax, eax |
|
x *= 4 |
shl eax, 2 or lea eax, [eax*4] |
LEA form does not set flags |
x *= 5 |
lea eax, [eax + eax*4] |
Classic LEA-abuse for non-power-of-two multiply — no MUL instruction |
x *= 9 |
lea eax, [eax + eax*8] |
Same pattern; watch for these in hash functions |
x /= 2 (unsigned) |
shr eax, 1 |
Signed equivalent: sar eax, 1 |
return x; |
mov eax, <value> → ret |
Return value is always in EAX (x86) / RAX (x64) |
return (struct large) |
Write to [EDI] / [RCX] then ret |
Large structs returned via hidden pointer passed as extra arg |
*ptr |
[eax] — e.g. mov eax, [eax] |
Dereference — the pointer value is already in the register |
ptr->field |
[eax + offset] |
offset is the struct field’s byte offset from the base |
array[i] |
[base + index*scale] |
SIB addressing — scale matches element size (4 for int[]) |
array[i].field |
[base + index*scale + field_offset] |
Full SIB + displacement |
memset(p, 0, n) |
xor eax, eax + rep stosd |
ECX = count in dwords; EDI = destination |
memcpy(dst, src, n) |
rep movsd (or rep movsb) |
ECX = count; ESI = source; EDI = destination |
strlen(s) |
repne scasb with AL=0 |
Scans EDI for null byte; ECX decrements; negate ECX–1 for length |
x & mask |
and eax, mask |
Also tests a single bit when mask is a power of two |
x \| flag |
or eax, flag |
Setting a bit without clearing others |
x ^ key |
xor eax, key |
The single most common malware operation — encryption, decryption, hash mixing |
~x |
not eax |
Bitwise complement — also seen as neg eax; dec eax |
(int)(char)x |
movsx eax, byte ptr [ebx] |
Sign-extend byte to 32-bit int |
(unsigned int)(unsigned char)x |
movzx eax, byte ptr [ebx] |
Zero-extend byte — the safe widening idiom |
switch (x) |
jmp [eax*4 + table_addr] |
Indirect jump through a jump table |
virtual->method() |
mov ecx, this → mov eax, [ecx] → call [eax + N] |
vtable dispatch: first dereference gets the vtable, second gets the slot |
GetProcAddress(…) reimplemented |
hash loop over export names → call [eax] |
No import table entry — common in shellcode and packer stubs |
Common Malware Indicator Patterns
Memorise these. When you see one in a binary, treat it as a high-confidence signal of a specific technique:
| What you see in disassembly | What it almost certainly means | How to confirm |
|---|---|---|
MOV EAX, FS:[0x30] or MOV RAX, GS:[0x60] |
PEB access — reading the process environment block for module list, image base, or heap | Followed by chained dereferences ([EAX+0x0C], [EAX+0x14], …) into the Ldr module list |
XOR on a byte-granularity loop over a buffer |
XOR decryption / encryption of embedded payload or config blob | The decrypted buffer is subsequently called into or passed to a second function |
CALL EAX / JMP EAX or CALL [EAX + N] after a hash-compare loop |
Dynamically resolved API call — import table is empty or missing | Trace EAX backwards to a GetProcAddress re-implementation walking the export table |
RDTSC … work … RDTSC … CMP EAX, threshold … JG |
Timing-based anti-debug check | Two RDTSC with same-register subtract; threshold is usually 0x10000–0x100000 |
CALL $+5; POP EBX; SUB EBX, 5 (x86) or LEA RBX, [RIP] (x64) |
PIC self-location — shellcode finding its own load address | Followed by EBX + offset references to embedded data/code within the shellcode |
PUSH 0x40; PUSH size; PUSH NULL; PUSH NULL; CALL NtAllocateVirtualMemory |
RWX memory allocation — staging area for injected shellcode | 0x40 = PAGE_EXECUTE_READWRITE; look for a subsequent write then transfer of control |
MOV EAX, [EAX + 0x3C] → ADD EAX, [EAX + 0x78] |
PE export directory walk — hand-rolled GetProcAddress |
Classic shellcode technique; 0x3C = PE offset field in DOS header, 0x78 = export directory RVA |
CPUID → check vendor string or bit 31 of ECX |
Hypervisor / VM detection | Malware aborts or switches to benign path when it detects a sandbox |
IN EAX, 0x40 / IN AL, 0x5658 |
VMware I/O port detection | Often wrapped in an SEH try/except — exception means no VMware; success means VM |
MOV EAX, LARGE FS:[0x0] (x86 SEH chain head) |
SEH chain manipulation — installing a custom exception handler | Malware uses SEH to catch intentional exceptions and redirect control flow |
INT 3 blocks or 0xCC byte padding inside function body |
Debugger trap or anti-attach bait | Malware scans its own code pages for 0xCC bytes inserted by software breakpoints |
REP STOSD zeroing a region → MOV of bytes → CALL into it |
Self-copying / decrypting shellcode followed by execution | The classic stager pattern — payload written to zeroed RWX memory, then jumped into |
MOV ECX, [ESI] immediately before CALL |
C++ method call (this in ECX) — thiscall convention |
Trace ESI back to a heap allocation or a global object to identify the class |
MOV EAX, [EAX + 0x20] then name-hash loop |
Kernel32 export hash walk | Compare the hash constant against known hash lists (e.g., 0x7C0DFCAA = LoadLibraryA) |