Introduction to Assembly
Overview
Goals:
- Understand what assembly is and why it matters.
- Learn about the x86-64 architecture and its syntax.
- Introduce
movinstructions and how data is moved at the assembly level.
1. What is Assembly?
Definition:
Assembly is a human-readable representation of machine code.
- Compilers like GCC convert C code into assembly.
- Each line of C may generate multiple assembly instructions.
- Assembly is machine-specific (e.g., x86-64, ARM, MIPS).
C → Assembly Example:
int sum = x + y;
Assembly Abstraction:
1) Copy x into register R1
2) Copy y into register R2
3) Add R2 to R1
4) Store result from R1 to sum
2. Registers
Definition:
Registers are 64-bit storage locations inside the CPU used for fast temporary storage and operations.
Common Registers:
%rax, %rbx, %rcx, %rdx,
%rsi, %rdi, %rbp, %rsp,
%r8 to %r15
Usage:
- Store intermediate values
- Pass function arguments
- Hold return values
3. Viewing Assembly
Use objdump -d <program> to inspect the assembly code of a compiled program.
4. Our First Assembly: sum_array
C Code:
int sum_array(int arr[], int nelems) {
int sum = 0;
for (int i = 0; i < nelems; i++) {
sum += arr[i];
}
return sum;
}
Disassembled Assembly:
4005b6: ba 00 00 00 00 mov $0x0,%edx ; i = 0
4005bb: b8 00 00 00 00 mov $0x0,%eax ; sum = 0
4005c0: eb 09 jmp 4005cb ; jump to condition
4005c2: 48 63 ca movslq %edx,%rcx ; rcx = (long) edx
4005c5: 03 04 8f add (%rdi,%rcx,4),%eax ; sum += arr[i]
4005c8: 83 c2 01 add $0x1,%edx ; i++
4005cb: 39 f2 cmp %esi,%edx ; compare i < nelems
4005cd: 7c f3 jl 4005c2 ; loop
4005cf: f3 c3 repz retq ; return
Key:
%rdi:arr%esi:nelems%eax: return value / sum%edx: loop index
5. Assembly Instruction Format
Each instruction has:
- Opcode (operation): e.g.,
mov,add,cmp - Operands (data): e.g., registers, memory, constants
6. mov Instruction
mov src, dst
- Moves bytes from
srctodst - Only one memory operand allowed
7. Operand Types
Immediate:
mov $0x42, %rax
Moves constant value 0x42 into %rax.
Register:
mov %rbx, %rax
Copies contents of %rbx into %rax.
Absolute Address:
mov 0x104, %rax
Copies value at memory address 0x104 into %rax.
Indirect:
mov (%rbx), %rax
Copies value at the memory address stored in %rbx into %rax.
Base + Displacement:
mov 0x10(%rax), %rbx
Loads from memory address 0x10 + %rax.
Indexed:
mov (%rax,%rdx), %rcx
Loads from address (%rax + %rdx).
Indexed + Displacement:
mov 0x10(%rax,%rdx), %rcx
Loads from 0x10 + %rax + %rdx.
Scaled Indexed:
mov (,%rdx,4), %rax
Loads from 4 * %rdx.
mov 0x4(,%rdx,4), %rax
Loads from 0x4 + 4 * %rdx.
Scaled Indexed with Base:
mov (%rax,%rdx,2), %rcx
Loads from %rax + 2 * %rdx.
mov 0x4(%rax,%rdx,2), %rcx
Loads from 0x4 + %rax + 2 * %rdx.
General Address Form:
Imm(%rb, %ri, scale)
=> Address = Imm + R[%rb] + R[%ri] * scale
8. Practice Questions from Slides
Example Setup:
Assume:
- Memory at
0x42=5 %rbx=8%rax=0x100%rdx=3- Memory at
0x10C=0x11
Examples:
mov $0x42, %rax ; rax = 0x42
mov 0x42, %rax ; rax = 5 (from memory)
mov %rbx, 0x55 ; memory[0x55] = 8
mov 4(%rax), %rcx ; rcx = memory[0x104] = 0xAB
mov 9(%rax,%rdx), %rcx ; rcx = memory[0x100 + 3 + 9] = ?
9. Summary
Key Points:
- GCC turns C code into x86-64 assembly.
- Assembly uses registers and memory addresses directly.
movis the fundamental instruction to move data.- Assembly instructions work on bytes, not types.
- Understanding operand forms is critical for reading assembly.