List of x86 SIMD instructions (Wikipedia Lab Guide)

Advanced x86 SIMD Instruction Set Study Guide
1) Introduction and Scope
This study guide provides a deep technical dive into the evolution and mechanics of x86 Single Instruction, Multiple Data (SIMD) instruction set extensions. SIMD architectures are fundamental to modern high-performance computing, enabling parallel processing of data elements within wide registers. This guide focuses on the underlying principles, architectural details, practical implications, and defensive engineering considerations for MMX, SSE, AVX, AVX2, AVX-512, FMA, and AMX. The scope is strictly technical, aiming to equip learners with a robust understanding of these powerful instruction sets for system analysis, optimization, and security.
2) Deep Technical Foundations
SIMD instructions operate on vectors, which are fixed-size arrays of data elements (lanes). A single SIMD instruction performs the same operation on each corresponding lane of multiple operand vectors simultaneously. This contrasts with scalar operations, which process a single data element at a time.
Key Concepts:
Vector Registers: Wide registers designed to hold multiple data elements. Their width has progressively increased across SIMD extensions:
- MMX: 64-bit registers (
mm0-mm7), aliased with x87 FPU registers. This aliasing requires careful state management. - SSE/SSE2: 128-bit registers (
xmm0-xmm15in 64-bit mode,xmm0-xmm7in 32-bit mode). These are dedicated registers, separate from x87. - AVX/AVX2: Extends
xmmregisters to 256-bit registers (ymm0-ymm15). The lower 128 bits ofymmNmap directly toxmmN. - AVX-512: Extends
ymmregisters to 512-bit registers (zmm0-zmm31). The lower 256 bits ofzmmNmap toymmN, and the lower 128 bits map toxmmN.
- MMX: 64-bit registers (
Lane Width: The size of individual data elements within a vector. Common lane widths include 8-bit (byte), 16-bit (word), 32-bit (doubleword), 64-bit (quadword), and 128-bit (double quadword). The choice of lane width impacts the number of elements that fit into a vector.
Data Types: SIMD instructions can operate on various data types, including integers (signed/unsigned) and floating-point numbers (single-precision FP32, double-precision FP64, half-precision FP16, bfloat16). The specific instruction mnemonic often indicates the data type (e.g.,
PSfor Packed Single-precision,PDfor Packed Double-precision,PIfor Packed Integer).Instruction Prefixes: The encoding of SIMD instructions has evolved significantly to support wider registers, more operands, and advanced features.
- Legacy Prefixes (66h, 0Fh, 0F38h, 0F3Ah): Used for MMX, SSE, and early SSE2 instructions. These prefixes modify the primary opcode. For example,
0Fhindicates an extended opcode set, and66hoften signifies a 128-bit operation for SSE. - VEX Prefix: Introduced with AVX, allowing for three-operand instructions, extended register access (up to
ymm15), and improved encoding efficiency. It's a 2- or 3-byte prefix.- VEX.128:
C4 Rxx 00(for 128-bit operations, maps toxmmregisters) - VEX.256:
C4 Rxx 01(for 256-bit operations, maps toymmregisters) R: Extends the ModR/M byte's register field.X: Extends the SIB byte's index field.B: Extends the ModR/M byte's base field or SIB byte's base field.W: Controls operand width (0 for 128-bit, 1 for 256-bit).L: Controls vector length (0 for 128-bit, 1 for 256-bit).P0,P1,P2: Select the opcode map.
- VEX.128:
- EVEX Prefix: Introduced with AVX-512, enabling 512-bit operations, opmask registers, broadcast functionality, rounding control, and more flexible encoding. It's a 4-byte prefix.
- EVEX.512:
62 R' X' B' R'' W L P0 P1 P2 R',X',B': Register specifiers for the ModR/M, SIB, or imm8 fields.W: Width specifier (0 for 32/64-bit, 1 for 8/16-bit).L: Vector length (0 for 128/256-bit, 1 for 512-bit).P0,P1,P2: Select the opcode map.b: Broadcast bit.z: Zeroing bit (destination register is zeroed before the operation if set).r: Rounding control bits.k: Opmask register specifier.
- EVEX.512:
- Legacy Prefixes (66h, 0Fh, 0F38h, 0F3Ah): Used for MMX, SSE, and early SSE2 instructions. These prefixes modify the primary opcode. For example,
Opcode Encoding: The specific operation, operand types, and vector length are encoded within the instruction's opcode and prefixes. The ModR/M byte, SIB byte (if present), and immediate operands further define the instruction's behavior.
Register Aliasing (MMX): MMX registers (
mm0-mm7) share physical storage with the x87 FPU registers (st0-st7). This means that using an MMX instruction can overwrite x87 state, and vice-versa. This requires careful management to avoid data corruption. For example,EMMS(Empty MMX State) instruction is often needed to clear the MMX state and allow normal x87 operation.
3) Internal Mechanics / Architecture Details
3.1) MMX (MultiMedia eXtensions)
- Registers: 64-bit
mm0tomm7. These are aliases for the x87 FPU data registers. - Data Types: Packed bytes (8 x 8-bit), packed words (4 x 16-bit), packed doublewords (2 x 32-bit). Operations are typically integer-based.
- Operation: Primarily single-operand instructions or two-operand instructions where the destination is also a source (e.g.,
PADDW mm1, mm2performsmm1 = mm1 + mm2). - Encoding: Uses legacy prefixes (e.g.,
0Fhfollowed by specific opcodes). - Example:
PADDW mm1, mm2adds packed words frommm2tomm1.
; Example: Adding two packed words (16-bit integers)
; mm1 = [w1, w2, w3, w4]
; mm2 = [v1, v2, v3, v4]
; Result in mm1: [w1+v1, w2+v2, w3+v3, w4+v4] (each w_i + v_i is a 16-bit addition)
; If overflow occurs in a lane, it wraps around (standard 16-bit integer arithmetic).
PADDW mm1, mm23.2) SSE (Streaming SIMD Extensions) & SSE2
- Registers: 128-bit
xmm0toxmm15(in 64-bit mode). These are dedicated registers. - Data Types:
- SSE: Packed single-precision floating-point (4 x 32-bit). Instructions often end in
PS. - SSE2: Packed double-precision floating-point (2 x 64-bit) and packed integers (16 x 8-bit, 8 x 16-bit, 4 x 32-bit, 2 x 64-bit). Instructions often end in
PD(double-precision float) orPIvariants for integers.
- SSE: Packed single-precision floating-point (4 x 32-bit). Instructions often end in
- Operation: Supports two-operand (
OP xmmd, xmm_s) and three-operand (VOP xmmd, xmm_s1, xmm_s2with VEX prefix) forms. - Encoding: Legacy prefixes for SSE and SSE2.
- Data Movement: Instructions like
MOVDQA(Move Double Quadword Aligned) andMOVDQU(Move Double Quadword Unaligned) are crucial for efficient data loading/storing.MOVDQArequires the memory operand to be 16-byte aligned, whileMOVDQUdoes not. - Example:
PADDD xmm1, xmm2adds packed doublewords (32-bit integers) fromxmm2toxmm1.
; Example: Adding two packed doublewords (32-bit integers)
; xmm1 = [d1, d2, d3, d4]
; xmm2 = [v1, v2, v3, v4]
; Result in xmm1: [d1+v1, d2+v2, d3+v3, d4+v4] (each d_i + v_i is a 32-bit addition)
; Overflow wraps around.
PADDD xmm1, xmm23.3) AVX (Advanced Vector Extensions)
Registers: Extends
xmmregisters to 256-bitymm0toymm15. The lower 128 bits ofymmNare accessible asxmmN.Encoding: Primarily uses the 2- or 3-byte VEX prefix, enabling three-operand syntax and wider vectors.
Data Types: Supports packed single-precision (8 x 32-bit) and double-precision (4 x 64-bit) floating-point operations. Integer operations on 256-bit vectors were introduced with AVX2.
VEX Prefix Structure:
- VEX.128:
C4 Rxx 00 - VEX.256:
C4 Rxx 01 R: Extends the ModR/M register operand (bit 3).X: Extends the SIB index field (bit 2).B: Extends the ModR/M base field or SIB base field (bit 1).W: Operand width (0 for 128-bit, 1 for 256-bit).L: Vector length (0 for 128-bit, 1 for 256-bit).P0,P1,P2: Opcode map selection.
- VEX.128:
Example:
VADDPS ymm1, ymm2, ymm3performs element-wise addition of packed single-precision floats fromymm2andymm3, storing the result inymm1. This is a three-operand instruction.
; Example: Adding two packed single-precision floats (32-bit floats)
; ymm1 = [f1, f2, f3, f4, f5, f6, f7, f8]
; ymm2 = [v1, v2, v3, v4, v5, v6, v7, v8]
; ymm3 = [w1, w2, w3, w4, w5, w6, w7, w8]
; Result in ymm1: [v1+w1, v2+w2, ..., v8+w8]
VADDPS ymm1, ymm2, ymm33.4) AVX2
- Registers: Continues to use 256-bit
ymmregisters. - Key Additions:
- Integer SIMD: Extends most AVX floating-point instructions to operate on packed integers (8, 16, 32, 64-bit).
- Gather Instructions: Enables non-contiguous memory loads into vector registers. For example,
VGATHERDPloads doublewords from memory addresses specified by indices in a vector register. - Fused Multiply-Add (FMA): While FMA instructions (FMA3, FMA4) are often associated with AVX, their availability depends on specific CPU features. FMA instructions perform
(a * b) + cin a single step, often with higher precision and reduced latency. - Bitwise Operations: Enhanced bitwise operations for integers, including bit shifts and rotates.
- Example:
VPADDD ymm1, ymm2, ymm3adds packed doublewords fromymm2andymm3, storing the result inymm1.
; Example: Adding two packed 32-bit integers
; ymm1 = [d1, d2, d3, d4, d5, d6, d7, d8]
; ymm2 = [v1, v2, v3, v4, v5, v6, v7, v8]
; ymm3 = [w1, w2, w3, w4, w5, w6, w7, w8]
; Result in ymm1: [v1+w1, v2+w2, ..., v8+w8]
VPADDD ymm1, ymm2, ymm33.5) AVX-512
Registers: Introduces 512-bit
zmm0tozmm31.zmmNregisters are an extension ofymmNandxmmN. There are also 8 opmask registers (k0-k7), each 64 bits wide.Encoding: Uses the 4-byte EVEX prefix, which is more flexible than VEX.
Key Features:
- Opmask Registers (
k0-k7): 64-bit registers that allow conditional execution of lanes within a vector operation. A '1' in a bit position enables the operation for that lane; a '0' disables it. This enables fine-grained control and efficient handling of irregular data. - Broadcast: Allows a single scalar value from memory or a register to be replicated across all lanes of a vector for an operation. This is controlled by the
bbit in the EVEX prefix. - Rounding Control: EVEX prefix supports explicit control of floating-point rounding modes per instruction using the
rbits. - Zeroing vs. Merging: The
zbit in the EVEX prefix determines whether lanes masked out are zeroed or retain their previous values. - Subsets: AVX-512 is modular, with subsets like AVX-512F (Foundation), AVX-512CD (Conflict Detection), AVX-512BW (Byte/Word), AVX-512DQ (Doubleword/Quadword), AVX-512VL (Vector Length Extensions, allowing 128/256-bit operations using EVEX encoding), AVX-512VNNI (Vector Neural Network Instructions), etc.
- Opmask Registers (
EVEX Prefix Structure:
- EVEX.512:
62 R' X' B' R'' W L P0 P1 P2 R',X',B': Register specifiers for ModR/M, SIB, or imm8 fields.W: Width specifier (0 for 32/64-bit, 1 for 8/16-bit).L: Vector length (0 for 128/256-bit, 1 for 512-bit).P0,P1,P2: Opcode map selection.b: Broadcast bit.z: Zeroing bit.r: Rounding control bits.k: Opmask register specifier.
- EVEX.512:
Example:
VPADDD zmm1 {k2}, zmm2, zmm3adds packed doublewords fromzmm2andzmm3, storing the result inzmm1. The operation is conditionally executed based on the bits set in opmask registerk2. If a bit ink2is 0, the corresponding lane inzmm1is not updated (merging semantics, assumingzbit is 0).
; Example: Masked addition of packed 32-bit integers
; zmm1 = [d1, ..., d16] (16x 32-bit lanes)
; zmm2 = [v1, ..., v16]
; zmm3 = [w1, ..., w16]
; k2 = [b1, ..., b16] (where b_i is 1 or 0, mapped to the 64-bit k register)
; Assuming EVEX prefix with z=0 (merge) and k=k2:
; Result in zmm1: [d1 if b1, v1+w1 else d1, ..., d16 if b16, v16+w16 else d16]
VPADDD zmm1 {k2}, zmm2, zmm33.6) FMA (Fused Multiply-Add)
- Concept: Combines a multiplication and an addition into a single instruction, often with higher precision and reduced latency. This is crucial for numerical stability and performance in linear algebra and signal processing.
- FMA3: Three-operand instructions (common). The mnemonic indicates the order of operands for the multiply and add.
vfmadd132sd xmm1, xmm2, xmm3:xmm1 = (xmm1 * xmm3) + xmm2(single-precision scalar)vfmadd213sd xmm1, xmm2, xmm3:xmm1 = (xmm2 * xmm1) + xmm3vfmadd231sd xmm1, xmm2, xmm3:xmm1 = (xmm2 * xmm3) + xmm1
Similar mnemonics exist for packed data (vfmadd132ps,vfmadd132pd, etc.) and for fused multiply-subtract (vfmsub).
- FMA4: Four-operand instructions (less common, primarily AMD).
- Encoding: Uses VEX or EVEX prefixes.
- Data Types: FP32, FP64, FP16 (AVX512-FP16), BF16 (AVX10.2).
- Example:
VFMADD231PD ymm1, ymm2, ymm3(packed double-precision) performsymm1 = (ymm2 * ymm3) + ymm1for each pair of double-precision elements.
; Example: Fused Multiply-Add for double-precision floats
; ymm1 = [a1, a2, a3, a4]
; ymm2 = [b1, b2, b3, b4]
; ymm3 = [c1, c2, c3, c4]
; Result in ymm1: [ (b1*c1)+a1, (b2*c2)+a2, (b3*c3)+a3, (b4*c4)+a4 ]
; The intermediate product (b_i * c_i) is kept at higher precision before adding a_i.
VFMADD231PD ymm1, ymm2, ymm33.7) AMX (Advanced Matrix Extensions)
- Registers: Introduces 8
tmm(tile) registers (tmm0-tmm7), each capable of holding a matrix of data. These are not general-purpose vector registers. - Concept: Designed for efficient matrix multiplication and other tensor operations, crucial for AI/ML workloads. AMX operates on blocks of data called "tiles" rather than fixed-width vectors.
- TILECFG Register: A special register used to configure the dimensions (rows, columns) and data types of the matrices within the
tmmregisters. This configuration must be set before using AMX instructions. - Instructions:
TDPBSSD(Tile Dot Product Byte Signed Signed Doubleword),TDPBSSDW(Tile Dot Product Byte Signed Signed Word), etc. These instructions perform block matrix operations. For example,TDPBSSDcomputesTile_Dest = Tile_Dest + (Tile_A * Tile_B)whereTile_AandTile_Bare byte matrices, and the result is accumulated intoTile_Destas doublewords. - Operation: Operates on blocks (tiles) of data, not individual lanes in the same way as SSE/AVX. This requires a different programming model focused on tile configuration and block operations.
4) Practical Technical Examples
4.1) Vector Addition in C with Intrinsics (AVX2)
This example demonstrates vector addition of two arrays using AVX2 intrinsics in C.
#include <immintrin.h> // For AVX/AVX2 intrinsics
#include <stdio.h>
#include <stdlib.h> // For aligned_alloc
// Define vector size for AVX2 (256 bits / 32 bits/float = 8 floats)
#define AVX2_FLOAT_VECTOR_SIZE 8
void vector_add_avx2(float *a, float *b, float *c, int n) {
int i = 0;
// Process elements in chunks of AVX2_FLOAT_VECTOR_SIZE floats
for (; i + AVX2_FLOAT_VECTOR_SIZE - 1 < n; i += AVX2_FLOAT_VECTOR_SIZE) {
// Load 8 floats from array a into a ymm register (unaligned load)
__m256 va = _mm256_loadu_ps(a + i);
// Load 8 floats from array b into a ymm register (unaligned load)
__m256 vb = _mm256_loadu_ps(b + i);
// Add the two ymm registers element-wise
__m256 vc = _mm256_add_ps(va, vb);
// Store the result back to array c (unaligned store)
_mm256_storeu_ps(c + i, vc);
}
// Handle remaining elements if n is not a multiple of AVX2_FLOAT_VECTOR_SIZE
for (; i < n; ++i) {
c[i] = a[i] + b[i];
}
}
int main() {
const int N = 20; // Example size, not a multiple of 8
// Allocate aligned memory for better performance with aligned loads/stores
// aligned_alloc requires block size to be a multiple of alignment and max_align_t.
// We use 32 bytes for AVX2 ymm registers.
float *arr_a = (float *)aligned_alloc(32, N * sizeof(float));
float *arr_b = (float *)aligned_alloc(32, N * sizeof(float));
float *arr_c = (float *)aligned_alloc(32, N * sizeof(float));
if (!arr_a || !arr_b || !arr_c) {
perror("Failed to allocate aligned memory");
return 1;
}
// Initialize arrays
for (int i = 0; i < N; ++i) {
arr_a[i] = (float)(i + 1);
arr_b[i] = (float)(i * 0.5f);
}
vector_add_avx2(arr_a, arr_b, arr_c, N);
printf("Result of vector addition (AVX2):\n");
for (int i = 0; i < N; ++i) {
printf("%.2f ", arr_c[i]);
}
printf("\n");
free(arr_a);
free(arr_b);
free(arr_c);
return 0;
}Explanation:
_mm256_loadu_ps(ptr): Loads 256 bits (8floats) from an unaligned memory addressptrinto a__m256register._mm256_load_psrequires 32-byte alignment._mm256_add_ps(__m256 a, __m256 b): Performs element-wise addition of two__m256registers containing packed single-precision floats._mm256_storeu_ps(ptr, __m256 val): Stores the contents ofvalto an unaligned memory addressptr._mm256_store_psrequires 32-byte alignment.- Alignment: Using
aligned_alloc(or equivalent) ensures that the data is aligned to 32 bytes, allowing the use of potentially faster aligned load/store instructions if desired (_mm256_load_ps,_mm256_store_ps). However,_loadu_and_storeu_are used here for simplicity and robustness against non-aligned data.
4.2) Masked Bitwise XOR with AVX-512
This example demonstrates a masked bitwise XOR operation using AVX-512. It uses unsigned int (32-bit integers), so we'll use AVX-512BW (Byte/Word) and AVX-512VL (Vector Length Extensions) for 32-bit operations.
#include <immintrin.h> // For AVX-512 intrinsics
#include <stdio.h>
#include <stdlib.h> // For aligned_alloc
// Define vector size for AVX-512 with 32-bit integers (512 bits / 32 bits/uint = 16 uints)
#define AVX512_UINT_VECTOR_SIZE 16
// Function to convert a pattern of 0s and 1s into an AVX-512 opmask register
// This is a simplified helper; real implementations might use more direct intrinsics
// or bit manipulation.
__mmask16 create_mask16(const unsigned int *mask_pattern, int n) {
__mmask16 k_mask = 0;
for (int i = 0; i < n && i < 16; ++i) {
if (mask_pattern[i]) {
k_mask |= (1 << i);
}
}
return k_mask;
}
void masked_xor_avx512(unsigned int *a, unsigned int *b, unsigned int *c, const unsigned int *mask_pattern, int n) {
int i = 0;
// Process elements in chunks of AVX512_UINT_VECTOR_SIZE uints
for (; i + AVX512_UINT_VECTOR_SIZE - 1 < n; i += AVX512_UINT_VECTOR_SIZE) {
// Load 16 unsigned ints from array a
__m512i va = _mm512_loadu_si512(a + i);
// Load 16 unsigned ints from array b
__m512i vb = _mm512_loadu_si512(b + i);
// Create the opmask register from the pattern for this chunk
__mmask16 k_mask = create_mask16(mask_pattern + i, AVX512_UINT_VECTOR_SIZE);
// Perform masked XOR.
// _mm512_mask_xor_epi32:
// - First arg: The destination register to merge into (or zero if z=1).
// Using _mm512_setzero_si512() for zeroing semantics.
// - Second arg: The opmask register (k_mask).
// - Third arg: First source vector (va).
// - Fourth arg: Second source vector (vb).
// Lanes where k_mask bit is 1: va ^ vb is computed and written to dest.
// Lanes where k_mask bit is 0: The original value of the dest lane is kept (merge).
// If we wanted to zero out masked lanes, we'd use:
// __m512i vc = _mm512_mask_xor_epi32(_mm512_setzero_si512(), k_mask, va, vb);
__m512i vc = _mm512_mask_xor_epi32(va, k_mask, va, vb); // Merging XOR
// Store the result back to array c
_mm512_storeu_si512(c + i, vc);
}
// Handle remaining elements
for (; i < n; ++i) {
if (mask_pattern[i]) { // Conceptual masking for scalar part
c[i] = a[i] ^ b[i];
} else {
c[i] = a[i]; // Keep original value if mask is 0
}
}
}
int main() {
const int N = 32; // Example size, multiple of 16 for full 512-bit vector
// Allocate aligned memory (64 bytes for AVX-512 zmm registers)
unsigned int *arr_a = (unsigned int *)aligned_alloc(64, N * sizeof(unsigned int));
unsigned int *arr_b = (unsigned int *)aligned_alloc(64, N * sizeof(unsigned int));
unsigned int *arr_c = (unsigned int *)aligned_alloc(64, N * sizeof(unsigned int));
unsigned int *arr_mask_pattern = (unsigned int *)aligned_alloc(64, N * sizeof(unsigned int));
if (!arr_a || !arr_b || !arr_c || !arr_mask_pattern) {
perror("Failed to allocate aligned memory");
return 1;
}
// Initialize arrays
for (int i = 0; i < N; ++i) {
arr_a[i] = 0x11111111 * (i + 1);
arr_b[i] = 0x22222222 * (i + 1);
arr_mask_pattern[i] = (i % 4 == 0) ? 1 : 0; // XOR every 4th element
}
masked_xor_avx512(arr_a, arr_b, arr_c, arr_mask_pattern, N);
printf("Result of masked XOR operation (AVX-512):\n");
for (int i = 0; i < N; ++i) {
printf("0x%08X ", arr_c[i]);
if ((i + 1) % 4 == 0) printf("\n"); // Newline every 4 elements for readability
}
printf("\n");
free(arr_a);
free(arr_b);
free(arr_c);
free(arr_mask_pattern);
return 0;
}Explanation of Masking:
- Opmask Registers (
k0-k7): These 64-bit registers control individual lanes. For AVX-512BW/DQ with 32-bit elements, a 64-bit opmask register can control up to 16 lanes (64 bits / 32 bits/lane = 2 lanes). A1in a bit position enables the operation for that lane, while a0disables it. - Masked Instructions: Instructions like
_mm512_mask_xor_epi32take an opmask register (__mmask16for 16 lanes of 32-bit data) as an argument. - Zeroing vs. Merging:
- Zeroing: If the
zbit in the EVEX prefix is set, lanes where the mask bit is0will be set to zero. The intrinsic_mm512_mask_xor_epi32(_mm512_setzero_si512(), k_mask, va, vb)achieves this. - Merging: If the
zbit is not set (default for_mm512_mask_xor_epi32when the first argument is a source vector), lanes where the mask bit is0will retain their original destination register value. The example uses merging semantics (_mm512_mask_xor_epi32(va, k_mask, va, vb)wherevais the destination and also a source).
- Zeroing: If the
4.3) Packet Structure Analysis (Conceptual)
Understanding SIMD instructions is crucial for analyzing high-performance network packet processing. Intrusion detection systems (IDS), firewalls, and deep packet inspection (DPI) engines often leverage SIMD for rapid pattern matching and data extraction.
Consider a scenario where we need to check for a specific 16-byte signature within a network packet payload.
Scenario: Extracting and comparing 16-byte chunks of a packet payload using SSE2 for signature matching.
#include <emmintrin.h> // For SSE2 intrinsics
#include <stdio.h>
#include <string.h>
#include <stdlib.h> // For malloc, free
// Example signature to search for (16 bytes)
const unsigned char signature[16] = "SECRET_PATTERN_123";
// Function to find a 16-byte signature in a payload using SSE2
// Returns the offset of the signature if found, otherwise -1.
int find_signature_sse2(const unsigned char *payload, int payload_len) {
int i = 0;
// Load the target signature into an SSE register.
// _mm_loadu_si128 loads 128 bits (16 bytes) from
---
## Source
- Wikipedia page: https://en.wikipedia.org/wiki/List_of_x86_SIMD_instructions
- Wikipedia API endpoint: https://en.wikipedia.org/w/api.php
- AI enriched at: 2026-03-30T20:27:15.255Z