Introduction to Assembly


Overview

Goals:

  • Understand what assembly is and why it matters.
  • Learn about the x86-64 architecture and its syntax.
  • Introduce mov instructions and how data is moved at the assembly level.

1. What is Assembly?

Definition:

Assembly is a human-readable representation of machine code.

  • Compilers like GCC convert C code into assembly.
  • Each line of C may generate multiple assembly instructions.
  • Assembly is machine-specific (e.g., x86-64, ARM, MIPS).

C → Assembly Example:

int sum = x + y;

Assembly Abstraction:

1) Copy x into register R1
2) Copy y into register R2
3) Add R2 to R1
4) Store result from R1 to sum

2. Registers

Definition:

Registers are 64-bit storage locations inside the CPU used for fast temporary storage and operations.

Common Registers:

%rax, %rbx, %rcx, %rdx,
%rsi, %rdi, %rbp, %rsp,
%r8 to %r15

Usage:

  • Store intermediate values
  • Pass function arguments
  • Hold return values

3. Viewing Assembly

Use objdump -d <program> to inspect the assembly code of a compiled program.


4. Our First Assembly: sum_array

C Code:

int sum_array(int arr[], int nelems) {
    int sum = 0;
    for (int i = 0; i < nelems; i++) {
        sum += arr[i];
    }
    return sum;
}

Disassembled Assembly:

4005b6: ba 00 00 00 00     mov $0x0,%edx        ; i = 0
4005bb: b8 00 00 00 00     mov $0x0,%eax        ; sum = 0
4005c0: eb 09              jmp 4005cb           ; jump to condition
4005c2: 48 63 ca           movslq %edx,%rcx     ; rcx = (long) edx
4005c5: 03 04 8f           add (%rdi,%rcx,4),%eax ; sum += arr[i]
4005c8: 83 c2 01           add $0x1,%edx        ; i++
4005cb: 39 f2              cmp %esi,%edx        ; compare i < nelems
4005cd: 7c f3              jl 4005c2            ; loop
4005cf: f3 c3              repz retq            ; return

Key:

  • %rdi: arr
  • %esi: nelems
  • %eax: return value / sum
  • %edx: loop index

5. Assembly Instruction Format

Each instruction has:

  • Opcode (operation): e.g., mov, add, cmp
  • Operands (data): e.g., registers, memory, constants

6. mov Instruction

mov src, dst
  • Moves bytes from src to dst
  • Only one memory operand allowed

7. Operand Types

Immediate:

mov $0x42, %rax

Moves constant value 0x42 into %rax.


Register:

mov %rbx, %rax

Copies contents of %rbx into %rax.


Absolute Address:

mov 0x104, %rax

Copies value at memory address 0x104 into %rax.


Indirect:

mov (%rbx), %rax

Copies value at the memory address stored in %rbx into %rax.


Base + Displacement:

mov 0x10(%rax), %rbx

Loads from memory address 0x10 + %rax.


Indexed:

mov (%rax,%rdx), %rcx

Loads from address (%rax + %rdx).


Indexed + Displacement:

mov 0x10(%rax,%rdx), %rcx

Loads from 0x10 + %rax + %rdx.


Scaled Indexed:

mov (,%rdx,4), %rax

Loads from 4 * %rdx.

mov 0x4(,%rdx,4), %rax

Loads from 0x4 + 4 * %rdx.


Scaled Indexed with Base:

mov (%rax,%rdx,2), %rcx

Loads from %rax + 2 * %rdx.

mov 0x4(%rax,%rdx,2), %rcx

Loads from 0x4 + %rax + 2 * %rdx.


General Address Form:

Imm(%rb, %ri, scale)
=> Address = Imm + R[%rb] + R[%ri] * scale

8. Practice Questions from Slides

Example Setup:

Assume:

  • Memory at 0x42 = 5
  • %rbx = 8
  • %rax = 0x100
  • %rdx = 3
  • Memory at 0x10C = 0x11

Examples:

mov $0x42, %rax        ; rax = 0x42
mov 0x42, %rax         ; rax = 5 (from memory)
mov %rbx, 0x55         ; memory[0x55] = 8
mov 4(%rax), %rcx      ; rcx = memory[0x104] = 0xAB
mov 9(%rax,%rdx), %rcx ; rcx = memory[0x100 + 3 + 9] = ?

9. Summary

Key Points:

  • GCC turns C code into x86-64 assembly.
  • Assembly uses registers and memory addresses directly.
  • mov is the fundamental instruction to move data.
  • Assembly instructions work on bytes, not types.
  • Understanding operand forms is critical for reading assembly.

readings

An Introduction to 64-bit Computing and x86-64

The story of Mel (Annotated version)