Assembly for Malware Analysts: x86, x64 & ARM in Ghidra and Binary Ninja

Assembly is the language between high-level code and raw machine bytes. When you open a sample in Ghidra or Binary Ninja, what you see is the disassembler’s reconstruction of that language. This guide teaches you to read it fluently — not just to understand each instruction, but to recognize the patterns that malware authors use: API resolvers, XOR decryptors, persistence loops, and anti-debug tricks.

Coverage spans x86 (32-bit), x64 (64-bit), and ARM/AArch64 (embedded, mobile, and modern Windows on Arm targets).

Related posts in this blog: Understanding and Attacking EDRs

EDR Bypass Roadmap

Anti-Debugging Techniques

Windows API Attack Surface

Why Assembly for Malware Analysis?
Getting Started — What to Expect
VSCode Setup for Assembly Practice
Assembly 101 — How to Read Assembler Code
CPU Registers — The Fast Lane
Memory and Addressing Modes
- The Memory Map
- Addressing Mode Syntax
The Stack — How Functions Think
- x86 Stack Layout
- Function Prologue and Epilogue
Core Instructions in Depth
Calling Conventions
ARM Assembly
Reading Disassembly in Ghidra and Binary Ninja
Common Malware Patterns
Quick Reference Tables

Why Assembly for Malware Analysis?

Modern malware arrives stripped of symbols, packed, and obfuscated. Decompilers help but they lie — they reconstruct intent from behavior, and when the behavior is adversarial, the reconstruction drifts. Raw disassembly never lies: every byte the CPU executes is exactly what you see.

Specifically, assembly literacy lets you:

Identify API calls even when the import table is empty (PEB walking, GetProcAddress chains)
Recognize crypto primitives by their bitwise patterns (XOR loops, XTEA key schedules, Salsa20 quarter-rounds)
Spot anti-analysis tricks before they fire (RDTSC timing, IsDebuggerPresent checks, NtQueryInformationProcess calls)
Understand shellcode that can never be decompiled — it has no PE header, no sections, no symbols

Getting Started — What to Expect

Learning assembly for the first time feels like having the rug pulled out: no types, no function names, no meaningful variable names — everything is registers, offsets, and flags. The cognitive load is real, but it drops fast once the patterns click.

The Mental Model Shift

In C you write x = a + b. In assembly you first load a into a register, add b to it, and the result sits in the same register. The instruction stream is completely flat — there is no notion of scope, type, or lifetime beyond what the calling convention imposes.

The most important shift: think in state, not abstractions. At any point in a function you can ask: what is in EAX right now? What does [EBP-8] hold? Where did ESP go? Building this running state machine in your head is the core skill the job requires.

What Is Actually Hard

Registers carry context that changes line-by-line. A register can hold a loop counter on one line and a pointer on the next. There is no IDE tooltip to tell you which it is right now.
Flags are invisible shared state. CMP EAX, EBX sets flags, and then ten instructions later a JL reads them. Other instructions between the compare and the branch can also modify flags — beginners miss this constantly.
Obfuscation looks syntactically identical to normal code. A dead XOR, a fake loop, a JMP to the very next instruction — nothing in the syntax signals “this is junk.”
Calling conventions are implicit. Nothing in the binary says “this is cdecl.” You have to infer it from how the caller prepares arguments and how the callee tears down.
Pointer arithmetic and integer arithmetic are indistinguishable. ADD EAX, 4 could be advancing a pointer by one int or incrementing a counter by four. Only context tells you which.

What Clicks Surprisingly Quickly

Most real malware uses fewer than 20 distinct instructions. MOV, PUSH/POP, CMP/TEST, JE/JNE/JL/JG, CALL/RET, XOR/AND/OR, ADD/SUB, LEA, INC/DEC. Master these and you can read around 80 % of what you will encounter.
Prologues and epilogues are boilerplate. After a few sessions you will recognise push ebp / mov ebp, esp / sub esp, N in under a second and jump straight to the logic that follows.
CFG loops are always the same shape. A back-edge in the control-flow graph is a loop — full stop. Train your eye on the graph view and you stop reading instructions linearly and start reading structure.
XOR decryptors look identical everywhere. Load byte, XOR, store byte, increment counter, compare to length, branch back. Once you recognise the shape you will spot it in any binary within seconds.
The PEB walk is copy-pasted across malware families. FS:[0x30] (x86) or GS:[0x60] (x64) followed by three or four chained dereferences is the same code in hundreds of samples.

Recommended Learning Path

Stage	Focus	Suggested exercise
1 — Foundation	x86 registers, the stack, `MOV`, `PUSH`/`POP`, `CALL`/`RET`	Hand-trace a cdecl “hello world” step-by-step in Ghidra’s listing view
2 — Control flow	`CMP`, `TEST`, `Jcc`, loops, switch-jump tables	Find a counted loop in any open-source binary; label the counter, body, and exit
3 — Conventions	cdecl vs stdcall vs x64 ABI; argument location rules	Identify argument-passing in five Win32 API calls (`CreateFile`, `VirtualAlloc`, etc.)
4 — Patterns	XOR decryptors, PEB walks, anti-debug idioms	Analyse a CTF reversing challenge from pwn.college or crackmes.one
5 — x64	Shadow space, RIP-relative addressing, R8–R15	Repeat stages 1–4 on a 64-bit Windows binary
6 — ARM	RISC philosophy, conditional execution suffix, Thumb	Analyse a simple Android `.so` from an open-source APK

Tools to Have Ready Before You Open a Sample

Tool	Purpose	Free?
Ghidra (NSA)	Full disassembler + decompiler; the best free starting point	Yes
Binary Ninja	Fast UI, excellent MLIL/HLIL layers, great scripting API	Trial / paid
x64dbg	Dynamic debugger for Windows x86/x64; pairs with Ghidra for static+dynamic	Yes
PE-bear	PE header inspector — understand the binary’s imports and sections before loading it	Yes
CFF Explorer	Import table, overlay, and resource inspector	Yes
FLOSS (Mandiant)	Extracts obfuscated and stack-built strings without executing the binary	Yes
Detect-It-Easy	Packer and compiler fingerprinting — tells you what unpacking you need first	Yes

Beginner trap to avoid: Do not start dynamic analysis (running the sample in a debugger) before you have done at least a pass of static analysis (Ghidra/Binary Ninja). Dynamic analysis is powerful but dangerous — malware can detect the debugger and feed you a decoy execution path. Static first, dynamic second.

VSCode Setup for Assembly Practice

Reading assembly in a disassembler is one skill; writing it to build intuition is another. VSCode with NASM gives you a lightweight environment to experiment with snippets without spinning up a full VM.

Essential Extensions

Install these four extensions from the VSCode Marketplace (Ctrl+Shift+X):

Extension ID	What it does
`13xforever.language-x86-64-assembly`	Syntax highlighting for x86/x64 NASM, MASM, GAS, and AT&T syntax
`OrangeX4.vscode-masm-run`	Adds run/build buttons for MASM/NASM files directly in the editor
`usernamehw.errorlens`	Inline error display — useful when nasm outputs errors with line numbers
`streetsidesoftware.code-spell-checker`	Optional but saves you from typo-driven bugs in label names

Install all four in one shot from the terminal:

code --install-extension 13xforever.language-x86-64-assembly
code --install-extension OrangeX4.vscode-masm-run
code --install-extension usernamehw.errorlens
code --install-extension streetsidesoftware.code-spell-checker

Installing NASM

Windows:

Download the NASM installer from nasm.us — pick the latest win64 .exe
Run the installer; tick “Add to PATH”
Verify in a new terminal: nasm --version

You also need a linker. The easiest option on Windows is to install the free GoLink linker or use the MinGW ld that ships with Git for Windows:

# Check both are on PATH
nasm --version   # e.g. NASM version 2.16.x
ld   --version   # GNU ld (part of MinGW / binutils)

Linux / WSL:

sudo apt install nasm build-essential   # Debian / Ubuntu
sudo dnf install nasm gcc               # Fedora / RHEL

Your First Assembly File

Create hello.asm and paste this x64 Linux snippet (works in WSL):

; hello.asm — x64 Linux, NASM syntax
; Assemble: nasm -f elf64 hello.asm && ld -o hello hello.o && ./hello

section .data
    msg  db "hello, asm", 10   ; 10 = newline
    len  equ $ - msg

section .text
    global _start

_start:
    mov rax, 1          ; syscall: write
    mov rdi, 1          ; fd: stdout
    mov rsi, msg        ; buffer address
    mov rdx, len        ; byte count
    syscall

    mov rax, 60         ; syscall: exit
    xor rdi, rdi        ; status: 0
    syscall

For Windows (x64 MASM-style with the Windows API), create hello_win.asm:

; hello_win.asm — x64 Windows, NASM syntax, links against kernel32
; Assemble+link:
;   nasm -f win64 hello_win.asm -o hello_win.obj
;   link /subsystem:console /entry:main hello_win.obj kernel32.lib

extern  ExitProcess
extern  GetStdHandle
extern  WriteConsoleA

section .data
    msg     db "hello, asm", 13, 10
    msglen  equ $ - msg
    written dq 0

section .text
    global main

main:
    sub     rsp, 40                 ; shadow space + alignment

    mov     rcx, -11                ; STD_OUTPUT_HANDLE
    call    GetStdHandle
    mov     rcx, rax                ; hConsole

    lea     rdx, [rel msg]          ; lpBuffer
    mov     r8d, msglen             ; nNumberOfCharsToWrite
    lea     r9,  [rel written]      ; lpNumberOfCharsWritten
    push    0                       ; lpReserved (5th arg on stack)
    call    WriteConsoleA

    xor     rcx, rcx
    call    ExitProcess

Tip for analysts: The Windows snippet demonstrates the x64 Microsoft ABI in action — shadow space, register arguments in RCX/RDX/R8/R9, and a stack-passed fifth argument. It is more instructive than the Linux version if your target is Windows malware.

Build Task (tasks.json)

Create .vscode/tasks.json in your project root so Ctrl+Shift+B assembles and links automatically:

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "NASM — build (Linux/WSL elf64)",
      "type": "shell",
      "command": "nasm -f elf64 ${file} -o ${fileDirname}/${fileBasenameNoExtension}.o && ld -o ${fileDirname}/${fileBasenameNoExtension} ${fileDirname}/${fileBasenameNoExtension}.o",
      "group": { "kind": "build", "isDefault": true },
      "presentation": { "reveal": "always", "panel": "shared" },
      "problemMatcher": {
        "owner": "nasm",
        "fileLocation": ["absolute"],
        "pattern": {
          "regexp": "^(.+):(\\d+):\\s+(.+)$",
          "file": 1, "line": 2, "message": 3
        }
      }
    },
    {
      "label": "Run assembled binary",
      "type": "shell",
      "command": "${fileDirname}/${fileBasenameNoExtension}",
      "group": "test",
      "dependsOn": "NASM — build (Linux/WSL elf64)",
      "presentation": { "reveal": "always", "panel": "shared" }
    }
  ]
}

After saving, press Ctrl+Shift+B while any .asm file is active to assemble it. NASM errors appear inline in the editor via ErrorLens.

Debugging with x64dbg

x64dbg is the go-to Windows debugger for malware analysis and also the best way to step through your hand-written assembly:

Download x64dbg and extract it — no install needed
Right-click the .exe your NASM build produced → Open with x64dbg
The binary breaks at the entry point automatically (_start / main)
Use F7 (step into) and F8 (step over) to trace execution
Watch the Registers panel on the right — every instruction updates it live

Workflow for learning: Write a small snippet in VSCode, build it, open the output in x64dbg, and step through it. Watching RSP change on every PUSH/POP and seeing RAX set to your expected value after a calculation is the fastest way to build register intuition.

VSCode + x64dbg shortcut: Add an x64dbg open task to tasks.json so pressing a keybinding launches the debugger directly on the built binary, saving the manual drag-and-drop step.

Assembly 101 — How to Read Assembler Code

Before drilling into registers and instructions, you need to parse the notation. This section teaches you to decode any line the disassembler shows you.

Anatomy of One Line

Every assembly line has up to four parts:

[label:]   mnemonic   [operand1[, operand2[, operand3]]]   [; comment]

Part	Optional?	Example	Meaning
Label	Yes	`loop_start:`	Named address — targets for jumps and calls
Mnemonic	No	`MOV`	The operation the CPU performs
Operands	Most mnemonics need 1–2	`EAX, 5`	What the operation acts on
Comment	Yes	`; i = 0`	Human annotation, ignored by assembler

xor_loop:           MOV   EAX, [ESI + ECX]   ; load byte from buffer
; ↑ label          ↑mnem  ↑dst   ↑src          ↑ comment

Intel syntax rule (used by Ghidra, Binary Ninja, and NASM):

Destination is always the left operand.

MOV EAX, 5 means “put 5 into EAX”, not “put EAX into 5”. Every instruction follows this convention: left = where the result lands, right = the source.

Intel vs AT&T Syntax

You will encounter both in the wild. Ghidra and Binary Ninja default to Intel; GDB and older GNU tools default to AT&T.

Feature	Intel (NASM / MASM)	AT&T (GAS / GDB)
Operand order	`dst, src`	`src, dst` — reversed
Register names	`EAX`	`%eax` — prefixed with `%`
Immediates	`5`	`$5` — prefixed with `$`
Memory reference	`[EAX]`	`(%eax)` — uses parentheses
Size suffix	`DWORD PTR [EAX]`	`movl (%eax)` — letter suffix on mnemonic (`b`=byte, `w`=word, `l`=long/dword, `q`=qword)
Example	`mov eax, [ebx + 8]`	`movl 8(%ebx), %eax`

If you see % before register names and $ before numbers, you are reading AT&T — flip the operand order mentally.

Practical tip: You can tell Ghidra to switch between syntaxes via Edit → Tool Options → Listing Fields → Operands → “Language”. Most analysts stay on Intel.

Parsing a Memory Reference

Square brackets in Intel syntax mean “dereference this address” — the same as *ptr in C.

[  base  +  index * scale  +  displacement  ]

Component	What it is	Example
base	A register holding the start address	`EBX`
index	An optional register acting as offset	`ECX`
scale	Multiplier for index: 1, 2, 4, or 8	`4` (size of `int`)
displacement	A constant byte offset	`8`

Decode each piece of [EBX + ECX*4 + 8] in English:

EBX          →  base address (start of an array)
ECX * 4      →  index × sizeof(int) — the Nth element
+ 8          →  skip 8 bytes past the start (e.g., past a struct header)
Result       →  array[N].field  where field is at offset 8

Common patterns you will see constantly:

[EBP - 4]          ; local variable #1 (4 bytes below frame pointer)
[EBP + 8]          ; first function argument (cdecl / stdcall)
[EAX]              ; *(ptr)  — simple dereference
[EAX + 0x3C]       ; ptr->field_at_offset_0x3C  (e.g. PE header offset)
[EAX + ECX]        ; ptr[i]  — byte array element
[EAX + ECX*4]      ; ptr[i]  — int array element (4 bytes each)

Reading a Sequence — Building Mental State

Assembly has no scope, no types, no variable names. Reading it means running a tiny virtual machine in your head. For every line, ask three questions:

Which registers change? — only the destination operand is written
Which flags change? — arithmetic and compare instructions update flags; MOV and LEA do not
Does memory get read or written? — any operand in [ ] touches memory

Work through a sequence by tracking register values as a table:

; Trace these five instructions top-to-bottom
mov  eax, 10        ; 1
mov  ecx, 3         ; 2
mul  ecx            ; 3 — EDX:EAX = EAX * ECX
sub  eax, 2         ; 4
push eax            ; 5

Step	Instruction	EAX	ECX	EDX	ESP	Memory
start	—	?	?	?	0xFF	—
1	`mov eax, 10`	10	?	?	0xFF	—
2	`mov ecx, 3`	10	3	?	0xFF	—
3	`mul ecx`	30	3	0	0xFF	—
4	`sub eax, 2`	28	3	0	0xFF	—
5	`push eax`	28	3	0	0xFB	`[0xFB]` = 28

The table discipline forces you to track exactly what each instruction does without skipping ahead — the most common beginner mistake.

Worked Example — Trace Five Lines

Here is a real-world snippet from a malware loader. Read it cold, then check the annotations:

00401020  mov  eax, [ebp + 8]    ; (1)
00401023  test eax, eax          ; (2)
00401025  jz   00401040          ; (3)
00401027  mov  ecx, [eax + 0x3C] ; (4)
0040102A  add  ecx, eax          ; (5)

Line by line:

#	Instruction	What it does	Mental note
1	`mov eax, [ebp+8]`	Load the first argument into EAX	EAX = arg1 (likely a pointer)
2	`test eax, eax`	AND EAX with itself — sets ZF if EAX is zero, no write	null-check on the pointer
3	`jz 00401040`	Jump to 0x401040 if ZF=1 (EAX was zero)	if (arg1 == NULL) goto error
4	`mov ecx, [eax + 0x3C]`	Read a DWORD 60 bytes into the struct EAX points at	0x3C is the `e_lfanew` field of a DOS header — this is reading the PE offset
5	`add ecx, eax`	ECX = ECX + EAX (base + offset)	ECX now points to the PE signature / IMAGE_NT_HEADERS

The five lines implement IMAGE_NT_HEADERS *nt = (IMAGE_NT_HEADERS*)(base + base->e_lfanew) — a pattern found in virtually every PE parser and loader you will encounter in malware analysis.

Key takeaway: You do not need to know every instruction before you start reading. You need the three questions (what changes, what flags, what memory?) and the habit of building the register table as you go. The patterns — null checks, struct field access, PE walking — repeat endlessly once you recognise them the first time.

CPU Registers — The Fast Lane

Registers are the CPU’s own ultra-fast memory — typically 8 to 32 slots, each holding one word of data. Every computation happens in registers; RAM is just slow storage the CPU ferries values to and from.

x86 General-Purpose Registers

On x86 (32-bit), eight general-purpose registers each hold a 32-bit (4-byte) value. Each register also exposes sub-word aliases that address smaller portions without extra instructions:

Full (32-bit)	Low 16-bit	High byte (bits 8–15)	Low byte (bits 0–7)	Primary convention
EAX	AX	AH	AL	Return value; arithmetic accumulator
EBX	BX	BH	BL	Base pointer; callee-saved
ECX	CX	CH	CL	Loop counter; `LOOP`, `REP`, `SHIFT`
EDX	DX	DH	DL	Extended return (`EDX:EAX`); I/O port
ESI	SI	—	—	Source index for string ops
EDI	DI	—	—	Destination index for string ops
ESP	SP	—	—	Stack pointer — always points to TOS
EBP	BP	—	—	Frame pointer — anchors local variable base

Ghidra / Binary Ninja tip: When you see [EBP - 0x8], that is a local variable 8 bytes below the frame pointer. When you see [EBP + 0x8], that is the first function argument (cdecl convention).

3116
158
70
· · ·
AH
AL
· · ·
AX 16-bit
EAX 32-bit
bits 31–16
bits 15–8
bits 7–0

x64 Extensions

x64 extends every 32-bit register to 64 bits and adds eight new registers. The naming convention prefixes R for the full 64-bit form:

x64 (64-bit)	x86 alias (low 32)	Low 16	Low 8	Convention
RAX	EAX	AX	AL	Return value
RBX	EBX	BX	BL	Callee-saved
RCX	ECX	CX	CL	Arg 1 (Windows)
RDX	EDX	DX	DL	Arg 2 (Windows)
RSI	ESI	SI	SIL	Arg 2 (Linux); callee-saved (Windows)
RDI	EDI	DI	DIL	Arg 1 (Linux); callee-saved (Windows)
RSP	ESP	SP	SPL	Stack pointer
RBP	EBP	BP	BPL	Frame pointer (optional in x64)
R8–R11	R8D–R11D	R8W–R11W	R8B–R11B	Arg 3–4 (Windows/Linux); caller-saved
R12–R15	R12D–R15D	R12W–R15W	R12B–R15B	Callee-saved

Critical x64 gotcha: Writing to a 32-bit sub-register (e.g. EAX) zero-extends into the 64-bit register (RAX). Writing to a 16-bit or 8-bit sub-register does not. This catches many analysts off-guard when reading decompiler output.

mov eax, 1      ; RAX = 0x0000000000000001  (upper 32 bits zeroed!)
mov ax,  1      ; RAX unchanged except low 16 bits
mov al,  1      ; RAX unchanged except low  8 bits

ARM / AArch64 Registers

ARM uses a load-store architecture: unlike x86, arithmetic instructions can only operate on registers, never directly on memory. Data must be explicitly loaded into a register first.

ARM (32-bit) registers:

Register	Alias	Role
R0–R3	—	Function arguments 1–4; return value in R0
R4–R11	—	General purpose; callee-saved
R12	IP	Intra-procedure-call scratch register
R13	SP	Stack pointer
R14	LR	Link Register — holds return address
R15	PC	Program Counter — current instruction address
—	CPSR	Current Program Status Register (flags)

AArch64 (64-bit) registers:

Register	Width	Role
X0–X7	64-bit	Function arguments 1–8; return in X0
X8	64-bit	Indirect result location / syscall number (Linux)
X9–X15	64-bit	Caller-saved temporaries
X16–X17	64-bit	Intra-procedure-call scratch
X18	64-bit	Platform reserved (TEB on Windows ARM64)
X19–X28	64-bit	Callee-saved
X29	64-bit	Frame pointer (FP)
X30	64-bit	Link register (LR)
SP	64-bit	Stack pointer (not a general register)
PC	64-bit	Program counter (not directly writeable)
—	32-bit each	W0–W30 — 32-bit aliases of X registers

The Instruction Pointer

The instruction pointer is the CPU’s “current position” register:

Architecture	Register	Notes
x86	EIP	Cannot be read directly; modified by JMP, CALL, RET
x64	RIP	Readable indirectly via `CALL $+5; POP RAX`; used for RIP-relative addressing
ARM32	PC (R15)	Readable and writable — writing to PC is a branch
AArch64	PC	Not directly writeable; only modified by branch instructions

Reversing tip: In x64 binaries, you will constantly see patterns like lea rax, [rip + 0x1234]. This is RIP-relative addressing — the operand is relative to the next instruction’s address. Ghidra and Binary Ninja both resolve these to absolute addresses automatically.

EFLAGS / RFLAGS — The Status Word

Every comparison and arithmetic operation updates individual bits in the flags register. Conditional jumps then branch based on these bits.

Flag	Bit	Set when…	Common instructions that set it	Jump / branch that reads it
CF	0	Carry/borrow out of the MSB (unsigned overflow)	`ADD`, `SUB`, `SHL`/`SHR`, `CLC`/`STC`, `MUL`	`JB`/`JNAE` (CF=1 → unsigned below); `JAE`/`JNB` (CF=0 → unsigned above-or-equal)
PF	2	Low byte of result has even parity	Most arithmetic and logic ops	`JP`/`JPE` (PF=1); `JNP`/`JPO` (PF=0) — rare in modern code; seen in CRC loops
AF	4	Carry from bit 3 to bit 4 (BCD arithmetic)	`ADD`, `SUB`, `INC`, `DEC`	Not tested by `Jcc`; consumed by `DAA`/`DAS` — almost never seen outside legacy x86
ZF	6	Result is zero	`CMP`, `TEST`, `AND`, `OR`, `XOR`, `ADD`, `SUB`, `INC`, `DEC`	`JE`/`JZ` (ZF=1 → equal/zero); `JNE`/`JNZ` (ZF=0 → not equal) — the most-used flag in disassembly
SF	7	Result is negative (sign bit is 1)	Most arithmetic and logic ops	`JS` (SF=1); `JNS` (SF=0); combined with OF for `JL`/`JG`
OF	11	Signed overflow — result too large for the signed type	`ADD`, `SUB`, `IMUL`, `NEG`, `INC`, `DEC`	`JO` (OF=1); `JNO` (OF=0); paired with SF for `JL` (SF≠OF) and `JGE` (SF=OF)
DF	10	Direction for string ops (0 = forward / increment, 1 = backward / decrement)	`CLD` clears it; `STD` sets it	Not a `Jcc` flag — implicitly consumed by `REP MOVS`, `REP STOS`, `SCAS`. Malware sets DF=1 before `REP STOSD` to wipe memory backwards
IF	9	Interrupts enabled	`STI` sets; `CLI` clears	Not testable from user mode — kernel/driver context only

Analyst tip — ZF is king: In practice, ZF is the flag you will track most often. TEST EAX, EAX / JNZ is the universal “is this value non-null?” idiom. CMP EAX, 0 / JE is “did this function return 0 (error/false)?”. If you can only track one flag, track ZF.

The ARM equivalent is the CPSR (Current Program Status Register) / NZCV flags in AArch64:

ARM Flag	x86 Equivalent	Meaning
N	SF	Negative result
Z	ZF	Zero result
C	CF	Carry
V	OF	Overflow

Memory and Addressing Modes

The Memory Map

A typical 32-bit Windows user-mode process looks like this:

Windows 32-bit Process — Virtual Address Space

0x00000000
NULL guard / unmapped
0x00010000
.text  code · RX
.data  initialised globals · RW
.rdata read-only data, const strings
.bss   uninitialised globals

          ↑
          Heap malloc / HeapAlloc — grows upward
        
· · ·

          ↓
          Stack local vars, return addresses — grows downward
        
0x7FFFFFFF
──── user / kernel boundary ────
Kernel space not accessible from user mode
0xFFFFFFFF
▲ top of addressable space

On 64-bit Windows the user-mode range extends to 0x00007FFFFFFFFFFF. The structure is the same but the addresses are much larger. The kernel occupies the upper half of the virtual address space.

Addressing Mode Syntax

Intel syntax (used by Ghidra and Binary Ninja by default) wraps memory references in square brackets:

; Direct (absolute address)
mov eax, [0x402000]          ; load 4 bytes from address 0x402000

; Register indirect
mov eax, [ebx]               ; load 4 bytes from address stored in EBX

; Base + displacement
mov eax, [ebp + 8]           ; first argument in a cdecl frame
mov eax, [ebp - 4]           ; first local variable

; Base + Index * Scale + Displacement  (SIB byte)
mov eax, [ebx + ecx*4 + 8]  ; array element: base + index*sizeof(int) + offset

ARM uses different syntax but the concept is identical:

; ARM32 — load/store
LDR  R0, [R1]          ; R0 = *(R1)
LDR  R0, [R1, #8]      ; R0 = *(R1 + 8)
LDR  R0, [R1, R2]      ; R0 = *(R1 + R2)
LDR  R0, [R1, R2, LSL #2] ; R0 = *(R1 + R2<<2)  — array index
STR  R0, [R1]          ; *(R1) = R0
STMFD SP!, {R4-R7, LR} ; push multiple registers onto stack (PUSH equivalent)
LDMFD SP!, {R4-R7, PC} ; pop and branch to LR — the ARM function return idiom

The Stack — How Functions Think

The stack grows downward on all common architectures: pushing a value decrements the stack pointer and writes the value at the new address.

x86 Stack Layout

x86 Stack Layout — cdecl frame

▲ higher addresses
· · · previous frames · · ·
← old EBP saved by callee
arg N
EBP + 4·(N+1)
· · ·
arg 2
EBP + 0x0C
arg 1
EBP + 0x08
return address
EBP + 0x04

          saved EBP
          ← EBP points here
        
EBP + 0x00
local var 1
EBP − 0x04
local var 2
EBP − 0x08
· · ·
local var N
EBP − 4·N
← ESP (grows downward)
(unallocated stack space)
▼ lower addresses

Key rules:

EBP is the stable anchor — it does not move during a function call. All local variables and arguments are addressed relative to it.
ESP moves freely as values are pushed/popped. Compilers often omit EBP in optimized code (frame-pointer omission / -fomit-frame-pointer) and use ESP-relative addressing instead.

Function Prologue and Epilogue

Every function you see in a disassembler begins and ends with boilerplate code to set up and tear down the stack frame.

x86 standard prologue:

push  ebp          ; save caller's frame pointer
mov   ebp, esp     ; establish new frame pointer
sub   esp, 0x28    ; reserve 0x28 (40) bytes for local variables
push  ebx          ; callee-saved registers that this function uses
push  esi
push  edi

x86 standard epilogue:

pop   edi          ; restore callee-saved registers (reverse order)
pop   esi
pop   ebx
mov   esp, ebp     ; collapse stack frame
pop   ebp          ; restore caller's frame pointer
ret                ; pop return address into EIP

The leave instruction is shorthand for mov esp, ebp; pop ebp. You will see it often in GCC output:

leave              ; equivalent: mov esp, ebp; pop ebp
ret

x64 prologue (Windows):

push  rbp
mov   rbp, rsp
sub   rsp, 0x40       ; shadow space (0x20) + locals
push  rbx             ; callee-saved registers
push  r12
push  r13
push  r14

In x64, many compilers omit the frame pointer entirely and address locals relative to RSP:

sub   rsp, 0x58       ; allocate stack space for locals + shadow space
; locals at [rsp+0], [rsp+8], etc.
add   rsp, 0x58       ; epilogue: collapse frame
ret

ARM32 prologue/epilogue:

; Prologue — push callee-saved regs and LR onto stack
PUSH  {R4, R5, R6, R7, LR}
SUB   SP, SP, #0x10    ; allocate 16 bytes for locals

; Epilogue — restore and return (loading LR into PC branches back)
ADD   SP, SP, #0x10
POP   {R4, R5, R6, R7, PC}

Writing PC from a pop is ARM’s atomic “restore and return” — it simultaneously restores registers and jumps to the saved LR value.

Core Instructions in Depth

Data Movement

Instruction	Example	Effect
MOV	`mov eax, 5`	EAX ← 5
MOV	`mov eax, [ebx]`	EAX ← memory at EBX
MOV	`mov [eax], ebx`	memory at EAX ← EBX
LEA	`lea eax, [ebx+4]`	EAX ← address EBX+4 (no memory read)
MOVZX	`movzx eax, byte [ebx]`	Load byte, zero-extend to 32 bits
MOVSX	`movsx eax, byte [ebx]`	Load byte, sign-extend to 32 bits
XCHG	`xchg eax, ebx`	Swap EAX ↔ EBX (atomic with LOCK prefix)
PUSH	`push eax`	ESP -= 4; `[ESP]` ← EAX
POP	`pop eax`	EAX ← `[ESP]`; ESP += 4

LEA trick: Compilers routinely abuse LEA for fast arithmetic. lea eax, [eax + eax*4] computes EAX * 5 without a multiply instruction. When you see LEA with no obvious pointer, think “fast multiply or multi-operand add.”

ARM equivalents:

; ARM32
MOV   R0, #5          ; R0 = 5 (immediate)
MOV   R0, R1          ; R0 = R1
LDR   R0, [R1]        ; R0 = *(R1)       — equivalent to x86 MOV reg, [reg]
STR   R0, [R1]        ; *(R1) = R0       — equivalent to x86 MOV [reg], reg
LDRB  R0, [R1]        ; load byte (zero-extended)
LDRSB R0, [R1]        ; load byte (sign-extended)
ADR   R0, label       ; R0 = address of label  (LEA equivalent)

Arithmetic

add   eax, 5           ; EAX += 5
sub   eax, ebx         ; EAX -= EBX
imul  eax, ecx, 7      ; EAX = ECX * 7  (signed multiply, 3-operand form)
mul   ecx              ; EDX:EAX = EAX * ECX  (unsigned; high bits in EDX!)
idiv  ecx              ; EAX = EAX/ECX quotient; EDX = remainder  (signed)
inc   eax              ; EAX++  (does NOT set CF — common gotcha)
dec   eax              ; EAX--
neg   eax              ; EAX = -EAX  (two's complement negation)

Malware pattern — mul for obfuscation: Malware authors sometimes use MUL or IMUL with unusual constants as a cheap hash function or address offset calculation. If you see a multiply followed by an add and then a memory dereference, you are likely looking at a hash-table lookup.

ARM32 arithmetic:

ADD   R0, R1, R2       ; R0 = R1 + R2
ADD   R0, R0, #4       ; R0 += 4
SUB   R0, R1, R2       ; R0 = R1 - R2
MUL   R0, R1, R2       ; R0 = R1 * R2  (low 32 bits)
UMULL R0, R1, R2, R3   ; R1:R0 = R2 * R3  (64-bit unsigned result)
RSB   R0, R1, #0       ; R0 = 0 - R1  (negate; ARM has no NEG instruction)

Bitwise & Shift

and   eax, 0xFF        ; mask — keep only low byte
or    eax, 0x04        ; set bit 2
xor   eax, eax         ; EAX = 0  (fastest zero idiom; also clears CF/OF)
xor   eax, key         ; encrypt/decrypt byte with key (most common malware op)
not   eax              ; bitwise complement
shl   eax, 3           ; logical shift left  3 ≡ multiply by 8
shr   eax, 1           ; logical shift right 1 ≡ unsigned divide by 2
sar   eax, 1           ; arithmetic shift right (preserves sign bit)
rol   eax, 4           ; rotate left  4 bits (used in hash functions / crypto)
ror   eax, 4           ; rotate right 4 bits
bswap eax              ; reverse byte order (endian swap)

xor reg, reg is the canonical “zero a register” idiom. It generates a 2-byte encoding versus the 5-byte mov eax, 0. You will see it at the start of almost every function to zero out return value or loop counter.

ARM32 bitwise:

AND   R0, R1, R2       ; R0 = R1 & R2
ORR   R0, R1, R2       ; R0 = R1 | R2  (note: ORR not OR)
EOR   R0, R1, R2       ; R0 = R1 ^ R2  (XOR)
MVN   R0, R1           ; R0 = ~R1  (NOT + move)
LSL   R0, R1, #3       ; R0 = R1 << 3
LSR   R0, R1, #1       ; R0 = R1 >> 1 (logical)
ASR   R0, R1, #1       ; R0 = R1 >> 1 (arithmetic)
ROR   R0, R1, #4       ; R0 = rotate_right(R1, 4)

; ARM's barrel shifter lets you combine shift with any data op:
ADD   R0, R1, R2, LSL #2   ; R0 = R1 + (R2 << 2)  — all in one instruction!

Comparison and Flags

CMP subtracts two values and discards the result — only flags are updated. TEST ANDs two values and discards the result. Neither instruction writes a register.

cmp   eax, 0           ; sets ZF if EAX==0, SF if EAX<0
test  eax, eax         ; exactly like `cmp eax, 0` but 1 byte shorter

cmp   eax, ebx
jl    less_label       ; signed: jump if EAX < EBX  (SF != OF)
jb    below_label      ; unsigned: jump if EAX < EBX (CF=1)

test  eax, 0x01        ; test bit 0
jnz   odd_label        ; jump if bit 0 was set

ARM32 comparisons:

CMP   R0, R1           ; flags = R0 - R1 (discards result)
TST   R0, #0x01        ; flags = R0 & 0x01
CMN   R0, R1           ; flags = R0 + R1 (compare negative)

; ARM conditionals are unique: any instruction can be conditional!
MOVEQ R0, #1           ; R0 = 1 ONLY if Z flag is set  (x86 needs a Jcc)
ADDNE R2, R2, #4       ; R2 += 4 ONLY if Z flag is clear

This conditional execution is a key ARM differentiator — instead of a cmp + jcc + branch target, a short if-else can be two unconditional + two conditional instructions with no branch at all.

Control Flow

; Unconditional
jmp   label            ; EIP = label
call  label            ; push EIP; EIP = label
ret                    ; EIP = [ESP]; ESP += 4
ret   8                ; EIP = [ESP]; ESP += 12  (stdcall — also pops 2 dwords of args)

; Conditional jumps (check after CMP/TEST)
je  / jz   label       ; jump if ZF=1  (equal / zero)
jne / jnz  label       ; jump if ZF=0  (not equal / not zero)
jl  / jnge label       ; signed less than         (SF!=OF)
jle / jng  label       ; signed less-than-or-equal (ZF=1 or SF!=OF)
jg  / jnle label       ; signed greater than       (ZF=0 and SF=OF)
jge / jnl  label       ; signed greater-than-or-equal (SF=OF)
jb  / jnae label       ; unsigned below            (CF=1)
ja  / jnbe label       ; unsigned above            (CF=0 and ZF=0)

; Loop
loop  label            ; ECX--; jump if ECX!=0
loope label            ; ECX--; jump if ECX!=0 AND ZF=1

ARM32 branches:

B     label            ; unconditional branch (x86 JMP)
BL    label            ; branch with link — saves PC+4 into LR (x86 CALL)
BX    LR               ; branch to address in LR — function return (x86 RET)
BLX   R0               ; branch-with-link to address in R0 — indirect call

; Conditional branches
BEQ   label            ; branch if Z=1
BNE   label            ; branch if Z=0
BLT   label            ; branch if N!=V  (signed less than)
BGT   label            ; branch if Z=0 and N=V (signed greater than)
BLO   label            ; branch if C=0  (unsigned lower)
BHI   label            ; branch if C=1 and Z=0 (unsigned higher)

String Operations

x86 has a family of bulk-memory instructions that operate on ESI/EDI and auto-increment/decrement them based on the DF flag. Combined with the REP prefix they form efficient memory loops.

cld                    ; clear DF — direction = forward (ESI/EDI increment)
std                    ; set DF   — direction = backward (decrement)

rep  movsb             ; copy ECX bytes from [ESI] to [EDI]
rep  stosd             ; fill ECX dwords at [EDI] with EAX (memset-like)
rep  cmpsb             ; compare ECX bytes at [ESI] vs [EDI] (memcmp-like)
repe scasb             ; scan EDI for byte in AL; ECX counts down

; Common Ghidra/BN patterns for these:
; rep movsd  ->  memmove(edi, esi, ecx*4)
; rep stosd  ->  memset(edi, eax, ecx*4)   (EAX is usually 0 = bzero)

Shellcode pattern: REP MOVSD/STOSD shows up in PE loaders embedded in shellcode — copying sections into allocated memory or zeroing the BSS.

Calling Conventions

Calling conventions define: where arguments go, who cleans the stack, and which registers must be preserved across a call.

x86 cdecl and stdcall

          cdecl                   stdcall
          ─────────────────────   ──────────────────────
args      pushed right to left    pushed right to left
cleanup   CALLER cleans stack     CALLEE cleans (RET n)
return    EAX (small values)      EAX
          EDX:EAX (64-bit)        EDX:EAX (64-bit)
saved     EBX, ESI, EDI, EBP     EBX, ESI, EDI, EBP

Spotting cdecl vs stdcall in a disassembler:

cdecl: after CALL, you see add esp, N — the caller cleaning up N bytes of arguments
stdcall: the CALL target ends with RET N — callee cleans its own arguments

; cdecl call: add(3, 7)
push  7
push  3
call  _add
add   esp, 8        ; caller pops 2 x 4-byte args

; stdcall call: MessageBoxA(NULL, "hi", "cap", 0)
push  0
push  offset caption
push  offset text
push  0
call  MessageBoxA   ; MessageBoxA does: ret 0x10  (cleans 16 bytes itself)

x64 Microsoft ABI

On 64-bit Windows, the first four integer/pointer arguments go in registers. There is no stack cleanup by the caller.

Arg 1 -> RCX     (or XMM0 if float)
Arg 2 -> RDX     (or XMM1)
Arg 3 -> R8      (or XMM2)
Arg 4 -> R9      (or XMM3)
Arg 5+ -> stack (above shadow space)

The shadow space (also called “home space”) is 32 bytes (4 x 8) that the caller must always allocate on the stack before a call, even if the function takes fewer than 4 arguments. The callee may spill its register arguments into this space.

; x64 call: CreateFileA(name, GENERIC_READ, ...)
sub   rsp, 0x28         ; shadow space (0x20) + alignment
mov   rcx, rax          ; arg1 = filename
mov   edx, 0x80000000   ; arg2 = GENERIC_READ
xor   r8d, r8d          ; arg3 = 0 (share mode)
xor   r9d, r9d          ; arg4 = NULL (security attrs)
; arg5-arg7 go on stack at [rsp+0x20], [rsp+0x28], [rsp+0x30]
mov   dword [rsp+0x20], 3          ; arg5 = OPEN_EXISTING
mov   dword [rsp+0x28], 0          ; arg6 = FILE_ATTRIBUTE_NORMAL
mov   qword [rsp+0x30], 0          ; arg7 = NULL
call  CreateFileA
add   rsp, 0x28

x86 fastcall and thiscall

Two more conventions appear constantly in Windows binaries — especially those compiled with MSVC.

__fastcall passes the first two integer arguments in ECX and EDX (skipping the stack for them), with the rest pushed right-to-left. The callee cleans the stack.

; __fastcall: myfunc(3, 7, 99)
mov   ecx, 3      ; arg1 → ECX
mov   edx, 7      ; arg2 → EDX
push  99          ; arg3 on stack (right-to-left)
call  myfunc      ; callee does: ret 4  (cleans only arg3)

Recognition tip: if you see MOV ECX, value and MOV EDX, value before a CALL and there is no ADD ESP, N after it, you are likely in __fastcall.

__thiscall is MSVC’s calling convention for C++ member functions. The hidden this pointer goes in ECX; remaining arguments are pushed right-to-left; the callee cleans.

; C++: obj->method(42)
mov   ecx, obj_ptr    ; ECX = this  ← the telltale sign
push  42              ; first explicit arg
call  MyClass_method  ; callee does: ret 4

C++ recognition shortcut: When you see MOV ECX, [some_ptr] immediately before a CALL, you are almost certainly looking at a C++ virtual or non-virtual method call. If the call is CALL [ECX] or CALL [ECX + N], it is a vtable dispatch — follow the pointer to find the virtual function table.

x64 System V (Linux)

Linux and macOS use a different ABI:

Arg 1 -> RDI
Arg 2 -> RSI
Arg 3 -> RDX
Arg 4 -> RCX
Arg 5 -> R8
Arg 6 -> R9
Arg 7+ -> stack
Callee-saved: RBX, R12-R15, RBP
No shadow space required
Syscall number -> RAX; invoke with SYSCALL instruction

Calling Convention Comparison — At a Glance

Use this table when you need to quickly identify which convention a binary uses and reconstruct the argument list from the disassembly:

Convention	Arg 1	Arg 2	Arg 3	Arg 4	Arg 5+	Stack cleanup	Callee must preserve	Common context
cdecl	stack	stack	stack	stack	stack	Caller (`ADD ESP, N` after CALL)	EBX, ESI, EDI, EBP	C functions, GCC x86 default, `printf`-style varargs
stdcall	stack	stack	stack	stack	stack	Callee (`RET N`)	EBX, ESI, EDI, EBP	Win32 API (`WINAPI` / `PASCAL` macros)
fastcall	ECX	EDX	stack	stack	stack	Callee (`RET N`)	EBX, ESI, EDI, EBP	MSVC `/Gr` flag, Windows kernel internal functions
thiscall	ECX (`this`)	stack	stack	stack	stack	Callee (`RET N`)	EBX, ESI, EDI, EBP	MSVC C++ non-virtual & virtual methods
x64 Windows	RCX	RDX	R8	R9	stack (above shadow)	Caller	RBX, RBP, RDI, RSI, R12–R15, XMM6–XMM15	All 64-bit Windows code; `RCX = this` in C++
x64 System V	RDI	RSI	RDX	RCX	R8, R9, then stack	Caller	RBX, RBP, R12–R15	Linux, macOS x64; `RDI = this` in C++
ARM32 AAPCS	R0	R1	R2	R3	stack (right-to-left)	Caller	R4–R11, SP	Android NDK, iOS (older), embedded ARM
AArch64 AAPCS64	X0	X1	X2	X3	X4–X7, then stack	Caller	X19–X28, X29 (FP), X30 (LR)	Apple Silicon, Android ARM64, Windows on Arm

Identifying the convention from disassembly:

Clue	Convention
`ADD ESP, N` immediately after `CALL`	cdecl — caller cleaning N bytes
`RET N` inside the callee	stdcall or thiscall — callee cleaning N bytes
`MOV ECX, ptr` then `CALL` with no `ADD ESP` after	thiscall — ECX is `this`
`MOV ECX, val; MOV EDX, val` before a `CALL`	fastcall — first two args in registers
`MOV RCX, …; MOV RDX, …; MOV R8D, …` before `CALL`	x64 Windows ABI
`MOV RDI, …; MOV RSI, …; MOV RDX, …` before `CALL`	x64 System V (Linux/macOS)
`MOV R0, …; MOV R1, …; BL func`	ARM32 AAPCS

ARM Assembly

ARM vs x86 Philosophy

Aspect	x86 / x64	ARM
Architecture	CISC — complex, variable-length instructions	RISC — uniform 32-bit instructions (mostly)
Memory operands	Allowed in arithmetic: `ADD EAX, [EBX]`	Never — only `LDR`/`STR` touch memory
Instruction size	1–15 bytes	4 bytes (ARM) / 2 or 4 bytes (Thumb)
Condition codes	Only branch instructions	Any instruction can be conditional
Barrel shifter	Separate shift instructions	Built-in: `ADD R0, R1, R2, LSL #2`
Endianness	Always little-endian	Configurable (usually little-endian)

ARM Registers Deep Dive

The ARM calling convention (AAPCS) assigns specific roles to registers that the disassembler will display without aliases. You must know them:

Saved registers (must be preserved):
  R4  R5  R6  R7  R8  R9  R10 R11(FP)

Scratch / argument registers (caller-saved):
  R0  R1  R2  R3

Special:
  R12 = IP  (intra-procedure scratch; used by PLT stubs on Linux)
  R13 = SP  (stack pointer — never use for anything else)
  R14 = LR  (link register — holds return address after BL)
  R15 = PC  (program counter — read is PC+8 in ARM mode, PC+4 in Thumb)

ARM PC offset gotcha: In ARM32 mode, reading PC gives the address of the current instruction +8 (not +4). This is a pipeline artifact from ARM’s 3-stage pipeline. Ghidra and Binary Ninja compensate automatically, but if you calculate addresses manually, remember the offset.

Key ARM Instructions

; ── Load / Store ─────────────────────────────────────────────
LDR   R0, [R1]          ; 32-bit load
LDRH  R0, [R1]          ; 16-bit load, zero-extend
LDRB  R0, [R1]          ; 8-bit load, zero-extend
LDRSB R0, [R1]          ; 8-bit load, sign-extend
STR   R0, [R1]          ; 32-bit store
STRB  R0, [R1, #3]      ; byte store with offset

; Pre-indexing (update base before access)
LDR   R0, [R1, #4]!     ; R0 = *(R1+4); R1 += 4

; Post-indexing (update base after access)
LDR   R0, [R1], #4      ; R0 = *R1; R1 += 4  -- very common in loops

; Multiple-register transfer (callee save/restore)
STMFD SP!, {R4-R11, LR} ; push R4..R11 and LR
LDMFD SP!, {R4-R11, PC} ; pop R4..R11 and jump to saved LR

; ── Branching ────────────────────────────────────────────────
B     func              ; jump
BL    func              ; call (saves PC+4 to LR)
BX    LR                ; return (branch to address in LR)
BLX   R0               ; indirect call (also switches ARM/Thumb mode)

; ── Data Processing ──────────────────────────────────────────
MOV   R0, #0xFF         ; R0 = 255
MOVW  R0, #0x1234       ; R0 = 0x1234  (16-bit immediate, ARMv6T2+)
MOVT  R0, #0x5678       ; R0[31:16] = 0x5678  (upper 16 bits)
; Together: MOVW/MOVT pair loads a full 32-bit constant
; This is the ARM equivalent of x86 `mov eax, imm32`

MRS   R0, CPSR          ; read flags/mode register
MSR   CPSR_f, R0        ; write flags field of CPSR

Thumb and Thumb-2 Mode

ARM processors can switch between ARM mode (4-byte instructions) and Thumb mode (2-byte instructions). This halves code size at a small performance cost — critical for embedded/mobile malware.

Detection in disassemblers:

Thumb mode functions have their symbol address OR’d with 1 (e.g., 0x00008001 instead of 0x00008000)
Ghidra and Binary Ninja auto-detect and display the right instruction set
BX Rn with the LSB of Rn set = switch to Thumb; clear = switch to ARM

Thumb-2 (ARMv6T2 / Cortex-A) extends Thumb with 32-bit instructions, giving near-ARM performance with compact encoding. Most modern Android/iOS malware uses Thumb-2.

; Thumb (16-bit) — notice missing base register in 2-reg ops
PUSH  {R4, LR}          ; save
MOV   R0, #5
ADD   R0, R1            ; R0 += R1  (Thumb: only 2-register form)
POP   {R4, PC}          ; restore and return

; Thumb-2 (32-bit prefix: 0xE8xx, 0xF0xx, 0xF8xx...)
MOVW  R0, #0xABCD       ; 32-bit immediate in Thumb-2
MOVT  R0, #0x1234

AArch64 (ARM64)

AArch64 is a complete redesign — not backward compatible with ARM32. Used in Apple Silicon, Raspberry Pi 4+, and Windows on Arm.

; Registers: X0-X30 (64-bit), W0-W30 (low 32 bits), SP, PC
; No condition codes on most instructions (unlike ARM32)
; No barrel shifter in addressing modes (separate shift instructions)

; Load / store
LDR   X0, [X1]          ; 64-bit load
LDR   W0, [X1]          ; 32-bit load (zero-extends into X0)
LDRB  W0, [X1]          ; byte load
STP   X29, X30, [SP, #-16]!  ; store pair (typical frame setup)
LDP   X29, X30, [SP], #16    ; load pair (typical frame teardown)

; Arithmetic
ADD   X0, X1, X2        ; X0 = X1 + X2
ADD   X0, X1, #8        ; X0 = X1 + 8
MUL   X0, X1, X2        ; X0 = X1 * X2  (low 64 bits)

; Branching
BL    func              ; call (saves PC+4 to X30/LR)
RET                     ; return via X30/LR (NOT ret like x86 — no stack pop)
BR    X0                ; indirect branch (x86: jmp rax)
BLR   X0               ; indirect call  (x86: call rax)

; Conditionals (separate compare-and-branch)
CBZ   X0, label         ; branch if X0 == 0  (no CMP needed)
CBNZ  X0, label         ; branch if X0 != 0
TBZ   X0, #3, label     ; branch if bit 3 of X0 == 0

ARM Calling Convention (AAPCS)

ARM32 (AAPCS):

Arguments 1-4 : R0  R1  R2  R3
Arguments 5+  : stack (pushed right-to-left)
Return value  : R0  (64-bit: R1:R0)
Callee-saved  : R4-R11, SP
Caller-saved  : R0-R3, R12, LR
Stack         : 8-byte aligned at public interfaces

AArch64 (AAPCS64):

Arguments 1-8 : X0-X7
Arguments 9+  : stack
Return value  : X0  (128-bit: X1:X0)
Callee-saved  : X19-X28, X29(FP), X30(LR), SP
Caller-saved  : X0-X18
Stack         : 16-byte aligned always

Reading Disassembly in Ghidra and Binary Ninja

Function Prologue Recognition

When you open a binary in Ghidra or Binary Ninja, every function begins with a recognizable setup sequence. Train your eye to skip past it instantly:

; Classic x86 frame setup — skim past this
              push ebp
E5             mov  ebp, esp
EC 20          sub  esp, 0x20
              push ebx
              push esi
              push edi
; <- HERE is where the actual logic starts

In Ghidra the decompiler view (press F on a function) collapses this to nothing — you see int local_24; int local_20; as declarations. In Binary Ninja, the HLIL (High-Level IL) also hides the prologue, but the MLIL and disassembly view show it raw.

Ghidra — Listing: decode_payload · Flat Dark theme

Listing (disassembly)

decode_payload

0040107b55 PUSHEBP; save caller frame

0040107c89 e5 MOVEBP,ESP

0040107e83 ec 28 SUBESP,0x28; 40 bytes of locals

0040108153 PUSHEBX

0040108256 PUSHESI

0040108357 PUSHEDI

004010848b 75 08 MOVESI,[EBP + param_1]; ← logic starts here

004010878b 7d 0c MOVEDI,[EBP + param_2]

0040108a8b 4d 10 MOVECX,[EBP + param_3]

Decompiler (Ghidra)

void decode_payload(byte *buf,byte *key,int len)

{

int local_24;

int local_20;

/* prologue variables auto-declared above */

/* ↓ actual logic the analyst cares about */

local_24 = 0;

while (local_24 < len) {

...

}

Key takeaway: The highlighted row at 00401084 is where the prologue ends and the real function body begins. Everything above it is bookkeeping — train your eye to skip it instantly.

x64 prologue without frame pointer (common in MSVC /O2):

48 83 EC 58       sub  rsp, 0x58   ; allocate 88 bytes
; NO push rbp — rbp may be used as a general register!
; Locals at [rsp+N], args-shadow at [rsp+0x20]-[rsp+0x38]

Recognizing Loops

Every loop in high-level code becomes one of two patterns in assembly:

Top-test loop (while / for):

loop_start:
  cmp   ecx, 0
  je    loop_end         ; exit if done
  ; body
  dec   ecx
  jmp   loop_start
loop_end:

Bottom-test loop (do-while — optimized form):

loop_body:
  ; body  (always executes at least once)
  dec   ecx
  jnz   loop_body        ; jump back while ECX != 0

Tip: In Ghidra graph view, a loop appears as a node with a back-edge arrow pointing upward. In Binary Ninja’s graph, loops have blue arrows (conditional) creating a cycle. Any upward-pointing edge is a loop candidate.

Ghidra — Graph View: xor_loop (CFG)

entry

XOR ECX,ECX

JMP check

↓

check (loop header)

CMP ECX,len

JGE loop_end

T (body)

↓

body

MOVZX EAX,[ESI+ECX]

XOR EAX,key

MOV [ESI+ECX],AL

INC ECX

JMP check ↑ back-edge

F (exit)

↓

loop_end

RET

ARM32 loop pattern:

; Classic counted loop: for (i=10; i>0; i--)
MOV   R2, #10          ; counter
loop_top:
  ; ... body using R0, R1 ...
  SUBS  R2, R2, #1     ; R2 -= 1; update flags (S suffix)
  BNE   loop_top       ; branch while R2 != 0

The S suffix on ARM instructions causes them to update NZCV flags — this is how ARM avoids a separate CMP before every branch.

Recognizing Conditionals

if / else:

; if (eax == 0) { A } else { B }
test  eax, eax
jnz   else_branch      ; if eax != 0, skip the 'if' body
; --- true branch (A) ---
; ...
jmp   end_if
else_branch:
; --- false branch (B) ---
; ...
end_if:

In Ghidra graph view: two outgoing edges from a diamond shape — one labelled T (true) and one F (false). The merge point is where both paths reconverge.

Ghidra — Graph View: if/else branch (CFG)

condition

TEST EAX,EAX

JNZ else_branch

T — true branch

↓

true_branch

; if (eax == 0) body

MOV EAX,1

JMP end_if

F — else branch

↓

else_branch

; else body

MOV EAX,0

↓ (both paths merge)

end_if (merge point)

RET

switch statement:

cmp   eax, 5
ja    default_case      ; value > 5: fall through to default
jmp   [eax*4 + jump_table]  ; indirect jump through table
jump_table:
dd    case_0, case_1, case_2, case_3, case_4, case_5

Ghidra recognizes jump tables and labels each case. Binary Ninja uses MLIL’s switch construct. If neither tool resolves a JMP [EAX*4 + addr], you are dealing with an obfuscated or dynamically computed jump table.

The Decompiler View

Ghidra and Binary Ninja both ship decompilers that convert disassembly to C-like pseudo-code. This is a reconstruction — the type information is guessed. Common pitfalls:

What you see in decompiler	What it really means
`(int )(param_1 + 0x3c)`	Structure field access — the decompiler doesn’t know the struct
`uVar1 = uVar2 ^ uVar3`	Likely XOR cipher — look at the key value
`do { ... } while (iVar1 != 0)`	A bottom-test loop (do-while)
`FUN_00401234(...)`	Unnamed function — rename it after analysis
`DAT_00403000`	A global variable — check cross-references
`(code *)DAT_...`	Indirect function call — possible shellcode dispatch table

Common Malware Patterns

XOR Decryption Loop

The simplest and most common obfuscation. Spot it by a loop with an XOR instruction and a byte-size memory reference:

; x86: XOR decrypt: for (i=0; i<len; i++) buf[i] ^= key[i % keylen]
xor   ecx, ecx          ; i = 0
xor_loop:
  movzx eax, byte [esi + ecx]   ; load ciphertext byte
  movzx edx, byte [edi + ecx]   ; load key byte
  xor   eax, edx                ; decrypt
  mov   [esi + ecx], al         ; store plaintext
  inc   ecx
  cmp   ecx, dword [ebp - 4]    ; compare to length
  jl    xor_loop

; ARM32 equivalent
MOV   R2, #0            ; i = 0
xor_loop:
  LDRB  R0, [R4, R2]   ; load ciphertext byte
  LDRB  R1, [R5, R2]   ; load key byte
  EOR   R0, R0, R1     ; XOR
  STRB  R0, [R4, R2]   ; store
  ADD   R2, R2, #1
  CMP   R2, R6         ; compare to length
  BLT   xor_loop

In Ghidra: Look for the Decompile window showing bVar = *(byte *)(buf + i) ^ *(byte *)(key + i % keyLen) inside a for-loop. Cross-reference the buffer to see where the decrypted data is used next — that reveals the payload type.

Ghidra — Listing + Decompiler: xor_decrypt (split view)

Listing

xor_loop

004010a00f b6 04 0e MOVZX EAX,[ESI+ECX*1]

004010a40f b6 14 0f MOVZX EDX,[EDI+ECX*1]

004010a833 c2 XOR EAX,EDX; ← XOR decrypt

004010aa88 04 0e MOV [ESI+ECX],AL

004010ad41 INC ECX

004010ae3b 4d fc CMP ECX,[EBP-0x4]

004010b17c ed JL xor_loop

Decompiler

void xor_decrypt(byte*buf,byte*key,int len)

{

int i;

i = 0;

while (i < len) {

buf[i] =

buf[i] ^ key[i];

i = i + 1;

}

return;

}

PEB Walking — API Resolution Without Imports

Shellcode and injected payloads cannot have an import table. They must resolve Win32 API addresses at runtime by walking the Process Environment Block (PEB):

; x86 PEB walk to find kernel32.dll base address
mov   eax, fs:[0x30]      ; EAX = &PEB  (FS segment always points to TEB)
mov   eax, [eax + 0x0C]   ; EAX = PEB->Ldr
mov   eax, [eax + 0x14]   ; EAX = Ldr->InMemoryOrderModuleList.Flink
mov   eax, [eax]           ; follow first entry (ntdll)
mov   eax, [eax]           ; follow second entry (kernel32)
mov   eax, [eax - 8 + 0x10] ; EAX = kernel32 base address

On x64, the PEB is at gs:[0x60] (not fs):

; x64 PEB walk
mov   rax, gs:[0x60]      ; RAX = &PEB64
mov   rax, [rax + 0x18]   ; PEB->Ldr
mov   rax, [rax + 0x20]   ; Ldr->InMemoryOrderModuleList.Flink
...

In Ghidra: You will see PTR fs:0x30 highlighted in blue (a special segment reference). Binary Ninja annotates the PEB structure fields automatically with the Windows types plugin loaded.

Ghidra — Listing: PEB walk (x86 shellcode)

resolve_kernel32

0000000064 a1 30 00 00 00MOV EAX,FS:[0x30]; EAX = &PEB ← TEB→PEB

000000068b 40 0c MOV EAX,[EAX+0xc]; EAX = PEB.Ldr

000000098b 40 14 MOV EAX,[EAX+0x14]; EAX = InMemoryOrderList.Flink

0000000c8b 00 MOV EAX,[EAX]; skip ntdll entry

0000000e8b 00 MOV EAX,[EAX]; EAX = kernel32 list entry

000000108b 40 08 MOV EAX,[EAX+0x8]; EAX = kernel32.dll base

0000001389 45 f8 MOV [EBP-0x8],EAX; save kernel32 base to local

Anti-Debugging via RDTSC

RDTSC reads the CPU’s timestamp counter (nanosecond precision) into EDX:EAX. Malware calls it twice and checks if the delta is larger than expected — debugging inflates the delay.

rdtsc                  ; first read: EDX:EAX = TSC
mov   esi, eax         ; save low 32 bits
; ... some code ...
rdtsc                  ; second read
sub   eax, esi         ; delta in EAX
cmp   eax, 0x10000     ; threshold
jg    debugger_detected

Ghidra — Listing: RDTSC anti-debug check

timing_check

004010c00f 31 RDTSC ; read TSC → EDX:EAX

004010c289 45 f8 MOV [EBP-0x8],EAX; save t1 low word

004010c5e8 96 00 00 00CALL some_work

004010ca0f 31 RDTSC ; read TSC → EDX:EAX (t2)

004010cc2b 45 f8 SUB EAX,[EBP-0x8]; delta = t2 - t1

004010cf3d 00 00 01 00CMP EAX,0x10000; threshold ~65 k cycles

004010d47f 0a JG debugger_detected; jump if too slow

004010d6e8 b4 02 00 00CALL real_payload

004010dbc3 RET

debugger_detected

004010dce8 2f 03 00 00CALL decoy_payload; mislead analyst

004010e1c3 RET

ARM equivalent: Uses MRC p15, 0, Rt, c9, c13, 0 (Performance Monitors Cycle Count Register, PMCCNTR) on privileged ARM cores, or CNTVCT_EL0 on AArch64.

Shellcode Stub Pattern

Position-independent shellcode must locate its own base address (since it doesn’t know where it will be injected). The classic x86 technique:

call  get_eip          ; push EIP of next instruction
get_eip:
pop   ebx              ; EBX = address of get_eip label
sub   ebx, 5           ; EBX = shellcode base (account for CALL encoding)
; Now EBX + offset = address of any data/code inside the shellcode

x64 version using RIP-relative LEA:

lea   rbx, [rip]       ; RBX = address of NEXT instruction
; Or more commonly just use [RIP + offset] directly for data references

ARM32: PC-relative loads are the native mechanism:

LDR   R0, [PC, #offset]  ; load value from (PC+8) + offset
; The assembler resolves `offset` so the literal pool is addressed correctly

Quick Reference Tables

x86 / x64 Jump Instructions

Signed	Unsigned	Condition	Flags
`JE` / `JZ`	`JE` / `JZ`	Equal / Zero	ZF=1
`JNE` / `JNZ`	`JNE` / `JNZ`	Not equal	ZF=0
`JL` / `JNGE`	`JB` / `JNAE`	Less / Below	SF!=OF / CF=1
`JLE` / `JNG`	`JBE` / `JNA`	Less-or-equal	ZF=1 or SF!=OF
`JG` / `JNLE`	`JA` / `JNBE`	Greater / Above	ZF=0 and SF=OF
`JGE` / `JNL`	`JAE` / `JNB`	Greater-or-equal	SF=OF / CF=0
—	`JS`	Sign	SF=1
—	`JO`	Overflow	OF=1
—	`JP` / `JPE`	Parity even	PF=1

ARM Condition Codes (suffix on any instruction)

Suffix	Meaning	Flags
`EQ`	Equal	Z=1
`NE`	Not equal	Z=0
`GT`	Signed greater than	Z=0 and N=V
`LT`	Signed less than	N!=V
`GE`	Signed greater-or-equal	N=V
`LE`	Signed less-or-equal	Z=1 or N!=V
`HI`	Unsigned higher	C=1 and Z=0
`LO`	Unsigned lower	C=0
`HS`	Unsigned higher-or-same	C=1
`LS`	Unsigned lower-or-same	C=0 or Z=1
`MI`	Minus / negative	N=1
`PL`	Plus / positive	N=0
`VS`	Overflow set	V=1
`AL`	Always (default)	—

Register Cheatsheet — What You See in the Disassembler

Color in this article	x86	x64	ARM32	AArch64
Green — general purpose	EAX EBX ECX EDX ESI EDI	RAX RBX RCX RDX RSI RDI R8-R11	R0–R11	X0-X18
Orange — stack/frame/PC	ESP EBP EIP	RSP RBP RIP	SP LR PC	SP LR PC FP
Blue — ARM-specific	—	—	R0-R15 CPSR	X0-X30 NZCV
Amber — flags	ZF CF SF OF PF AF DF	ZF CF SF OF PF AF DF	N Z C V	N Z C V

Size Specifiers — Operand Widths

When a memory operand is ambiguous, the assembler requires an explicit size keyword. These appear constantly in disassembly and tell you the width of the data being read or written:

Keyword	Bits	Bytes	x86 register aliases	Typical use in disassembly
`BYTE PTR`	8	1	AL, BL, CL, DL (and R_B in x64)	Character data, byte flags, single-byte XOR: `mov byte ptr [eax], 0`
`WORD PTR`	16	2	AX, BX, CX, DX (and R_W in x64)	Unicode char pairs, port I/O, 16-bit struct fields: `mov ax, word ptr [ebx+2]`
`DWORD PTR`	32	4	EAX, EBX, ECX, EDX, ESI, EDI	Local `int`, pointer on x86, HANDLE: `cmp dword ptr [ebp-4], 0`
`QWORD PTR`	64	8	RAX, RBX, RCX, … (x64 only)	Pointer on x64, `LONGLONG`, timestamp: `mov rax, qword ptr [rsi+8]`
`XMMWORD PTR`	128	16	XMM0–XMM15	SIMD data, AES-NI cipher rounds, fast memcpy: `movdqu xmm0, xmmword ptr [rdi]`
`YMMWORD PTR`	256	32	YMM0–YMM15	AVX bulk operations; rare in malware but present in optimised crypto libraries

Width as a clue: A loop that accesses BYTE PTR [ESI + ECX] is processing raw bytes — likely a string, shellcode buffer, or cipher stream. Switch to DWORD PTR and you are almost certainly processing an array of integers, pointers, or 32-bit hash values.

C ↔ Assembly Equivalents

Use this table to mentally “lift” disassembly back to high-level intent before reaching for the decompiler:

C / C++ pattern	x86 / x64 assembly	Notes
`int x = 0;`	`xor eax, eax` → `mov [ebp-4], eax`	XOR zero is 2 bytes; `mov eax, 0` is 5 bytes — compilers always pick XOR
`if (x == 0)`	`test eax, eax` → `je label`	`TEST` is shorter than `cmp eax, 0`; identical flag result
`if (x != 0)`	`test eax, eax` → `jnz label`	The “is this pointer null?” pattern
`if (x < 0)`	`test eax, eax` → `js label`	Testing sign flag directly; no CMP needed
`x++`	`inc eax`	INC does not set CF — subtle source of bugs in flag-dependent code
`x--`	`dec eax`	Same: DEC does not set CF
`x += n`	`add eax, n`
`x -= n`	`sub eax, n`
`x *= 2`	`shl eax, 1` or `add eax, eax`
`x *= 4`	`shl eax, 2` or `lea eax, [eax*4]`	LEA form does not set flags
`x *= 5`	`lea eax, [eax + eax*4]`	Classic LEA-abuse for non-power-of-two multiply — no MUL instruction
`x *= 9`	`lea eax, [eax + eax*8]`	Same pattern; watch for these in hash functions
`x /= 2` (unsigned)	`shr eax, 1`	Signed equivalent: `sar eax, 1`
`return x;`	`mov eax, <value>` → `ret`	Return value is always in EAX (x86) / RAX (x64)
`return (struct large)`	Write to `[EDI]` / `[RCX]` then `ret`	Large structs returned via hidden pointer passed as extra arg
`*ptr`	`[eax]` — e.g. `mov eax, [eax]`	Dereference — the pointer value is already in the register
`ptr->field`	`[eax + offset]`	`offset` is the struct field’s byte offset from the base
`array[i]`	`[base + index*scale]`	SIB addressing — scale matches element size (4 for `int[]`)
`array[i].field`	`[base + index*scale + field_offset]`	Full SIB + displacement
`memset(p, 0, n)`	`xor eax, eax` + `rep stosd`	ECX = count in dwords; EDI = destination
`memcpy(dst, src, n)`	`rep movsd` (or `rep movsb`)	ECX = count; ESI = source; EDI = destination
`strlen(s)`	`repne scasb` with AL=0	Scans EDI for null byte; ECX decrements; negate ECX–1 for length
`x & mask`	`and eax, mask`	Also tests a single bit when mask is a power of two
`x \\| flag`	`or eax, flag`	Setting a bit without clearing others
`x ^ key`	`xor eax, key`	The single most common malware operation — encryption, decryption, hash mixing
`~x`	`not eax`	Bitwise complement — also seen as `neg eax; dec eax`
`(int)(char)x`	`movsx eax, byte ptr [ebx]`	Sign-extend byte to 32-bit int
`(unsigned int)(unsigned char)x`	`movzx eax, byte ptr [ebx]`	Zero-extend byte — the safe widening idiom
`switch (x)`	`jmp [eax*4 + table_addr]`	Indirect jump through a jump table
`virtual->method()`	`mov ecx, this` → `mov eax, [ecx]` → `call [eax + N]`	vtable dispatch: first dereference gets the vtable, second gets the slot
`GetProcAddress(…)` reimplemented	hash loop over export names → `call [eax]`	No import table entry — common in shellcode and packer stubs

Common Malware Indicator Patterns

Memorise these. When you see one in a binary, treat it as a high-confidence signal of a specific technique:

What you see in disassembly	What it almost certainly means	How to confirm
`MOV EAX, FS:[0x30]` or `MOV RAX, GS:[0x60]`	PEB access — reading the process environment block for module list, image base, or heap	Followed by chained dereferences (`[EAX+0x0C]`, `[EAX+0x14]`, …) into the Ldr module list
`XOR` on a byte-granularity loop over a buffer	XOR decryption / encryption of embedded payload or config blob	The decrypted buffer is subsequently called into or passed to a second function
`CALL EAX` / `JMP EAX` or `CALL [EAX + N]` after a hash-compare loop	Dynamically resolved API call — import table is empty or missing	Trace EAX backwards to a `GetProcAddress` re-implementation walking the export table
`RDTSC` … work … `RDTSC` … `CMP EAX, threshold` … `JG`	Timing-based anti-debug check	Two RDTSC with same-register subtract; threshold is usually `0x10000`–`0x100000`
`CALL $+5; POP EBX; SUB EBX, 5` (x86) or `LEA RBX, [RIP]` (x64)	PIC self-location — shellcode finding its own load address	Followed by `EBX + offset` references to embedded data/code within the shellcode
`PUSH 0x40; PUSH size; PUSH NULL; PUSH NULL; CALL NtAllocateVirtualMemory`	RWX memory allocation — staging area for injected shellcode	`0x40` = `PAGE_EXECUTE_READWRITE`; look for a subsequent write then transfer of control
`MOV EAX, [EAX + 0x3C]` → `ADD EAX, [EAX + 0x78]`	PE export directory walk — hand-rolled `GetProcAddress`	Classic shellcode technique; `0x3C` = PE offset field in DOS header, `0x78` = export directory RVA
`CPUID` → check vendor string or bit 31 of ECX	Hypervisor / VM detection	Malware aborts or switches to benign path when it detects a sandbox
`IN EAX, 0x40` / `IN AL, 0x5658`	VMware I/O port detection	Often wrapped in an SEH try/except — exception means no VMware; success means VM
`MOV EAX, LARGE FS:[0x0]` (x86 SEH chain head)	SEH chain manipulation — installing a custom exception handler	Malware uses SEH to catch intentional exceptions and redirect control flow
`INT 3` blocks or `0xCC` byte padding inside function body	Debugger trap or anti-attach bait	Malware scans its own code pages for `0xCC` bytes inserted by software breakpoints
`REP STOSD` zeroing a region → `MOV` of bytes → `CALL` into it	Self-copying / decrypting shellcode followed by execution	The classic stager pattern — payload written to zeroed RWX memory, then jumped into
`MOV ECX, [ESI]` immediately before `CALL`	C++ method call (`this` in ECX) — thiscall convention	Trace ESI back to a heap allocation or a global object to identify the class
`MOV EAX, [EAX + 0x20]` then name-hash loop	Kernel32 export hash walk	Compare the hash constant against known hash lists (e.g., `0x7C0DFCAA` = `LoadLibraryA`)

Written on June 23, 2026

◀ Back to attack related posts