Replace hardcoded magic numbers with symbolic constants for ARM64
instruction opcodes, matching the style used in x86_64 backend.
Changes:
- arm64-tok.h: Add 93 new opcode constants and helper macros
- Instruction opcodes: ARM64_ADD_IMM, ARM64_LDR_X, ARM64_B, etc.
- Helper macros: ARM64_RD(), ARM64_RN(), ARM64_IMM12(), etc.
- Field encodings: ARM64_SF(), ARM64_S(), ARM64_SH(), etc.
- arm64-asm.c: Refactor all instruction generation functions
- gen_movz/gen_movn/gen_movk: Use ARM64_MOVZ/MOVN/MOVK
- gen_add_imm/gen_sub_imm: Use ARM64_ADD_IMM/SUB_IMM
- gen_dp_reg: Use symbolic opcodes
- gen_ldst_imm/gen_ldst_pair: Use ARM64_LDR_*/STR_*
- gen_b/gen_bl/gen_br/gen_blr/gen_ret: Use ARM64_B/BL/BR/BLR/RET
- gen_cbz/gen_cbnz: Use ARM64_CBZ/CBNZ
- gen_shift: Use ARM64_LSL_REG/LSR_REG/ASR_REG/ROR_REG
- gen_barrier: Use ARM64_ISB/DSB/DMB
- gen_mrs/gen_msr: Use symbolic constants
- Inline asm save/restore: Use ARM64_STP_X/LDP_X
- arm64-gen.c: Begin systematic refactoring (first batch)
- arm64_sub_sp: Use ARM64_SUB_IMM with helper macros
Benefits:
- Readability: Self-documenting code (ARM64_LDR_X vs 0xF9400000)
- Maintainability: Easier to spot encoding errors
- Consistency: Matches x86_64 backend style
- Safety: Helper macros prevent bit-shift mistakes
All tests pass with no functional changes.
- Remove unnecessary braces from single-statement if blocks
- Remove trailing whitespace throughout file
- Remove duplicate comment
Style now matches existing ARM64 backend and TCC conventions:
- Allman style for function definitions
- No braces for single-statement control structures
- Consistent 4-space indentation
Implement full GCC-style extended inline assembly for ARM64 backend:
- Add constraint parsing (constraint_priority, skip_constraint_modifiers)
- Implement register allocation (asm_compute_constraints)
- Add code generation for prolog/epilog and load/store (asm_gen_code)
- Support output/input/read-write operands with r, w, f, x, m, g constraints
- Support immediate constraints (i, I, J, K, L, n)
- Handle clobber lists (registers, memory, cc)
- Support constraint references, early clobber, named operands
- Fix '#' character handling in tccpp.c for ARM64 asm mode
Tests: Add comprehensive test suite with 18 test cases covering all features.
All existing TCC tests continue to pass.
parse_addr_operand() silently accepted invalid register names like
[xyz] without error. Now explicitly validates the register and calls
tcc_error() if arm64_parse_regvar() returns -1 or >= 32.
Before: invalid registers caused silent wrong code or confusing errors
After: clear error message 'invalid register in address operand'
LSL/LSR/ASR immediate shifts are UBFM/SBFM aliases with specific
immr/imms field encodings:
- LSL #shift: immr = (width - shift) & 0x3F, imms = width - 1
- LSR #shift: immr = shift & 0x3F, imms = width - 1
- ASR #shift: immr = shift & 0x3F, imms = width - 1
Fixes:
- immr field now always masked with 0x3F (6 bits), not width-1
- imms field is constant (width-1), not calculated from shift
- ROR uses EXTR format (Rm=shift, Rn=src, Rd=dest), not UBFM format
Based on ARM ARM documentation for UBFM/SBFM/EXTR instructions.
OPT_VREG, OPT_IM12, OPT_SHIFT, and OPT_REGSET were defined in the enum
and as OP_* bit masks but never used by any parsing function or
instruction handler in arm64-asm.c.
These appear to be artifacts copied from other assembler implementations
(arm-asm.c uses OP_VREG32/OP_VREG64/OP_REGSET32, riscv64-asm.c uses
OP_IM12S) but were never integrated into the ARM64 operand parsing logic.
Removing these unused definitions:
- Eliminates confusion for developers
- Reduces code clutter
- Makes the actual operand types (OPT_REG, OPT_IM, OPT_ADDR, OPT_COND)
clearer
asm_branch() had two identical 15-case switch blocks (30 lines total)
that duplicated condition code mapping. This also duplicated the logic
in the existing parse_condition() helper.
Added get_branch_condition() helper that:
1. Maps branch tokens (TOK_ASM_beq) to condition tokens (TOK_ASM_eq)
2. Calls the existing parse_condition() helper
3. Returns the condition code (0-13) or -1 for non-conditional branches
This reduces code duplication from 30 lines to a single 29-line helper
function, and ensures all condition mapping logic is in one place.
Multiple instruction handlers were extracting op->reg without checking
that the operand was actually a register. When parse_operand() failed
to recognize a token, it set op->reg = -1, which when masked with 0x1F
became 31 (xzr/sp), silently encoding wrong instructions.
Now each handler validates operand types before extraction:
- asm_shift: validates op1 and op2 are registers
- asm_data_proc: validates op1, op2, and op3 are registers
- asm_ldst: validates op1 is register, op2 is address
- asm_ldst_pair: validates op1 and op2 are registers, op3 is address
This implements fail-fast behavior to catch typos and invalid operands
immediately rather than producing silently incorrect code.
Previously, parse_operand() would silently accept any unrecognized token
and pass it to asm_expr() as an immediate, causing typos like:
add x0, x1, xyz ; 'xyz' is not a valid register
to be silently assembled as a symbol reference instead of erroring.
Now, if a token is not a register, condition code, or valid immediate
prefix (#, :, @, $), an error is emitted for identifier tokens.
This implements fail-fast behavior for invalid operands, making it easier
to catch typos and mistakes in assembly code.
The asm_data_proc function was OR-ing register widths together, which
allowed invalid ARM64 instructions like 'add x0, w1, w2' (mixed widths).
ARM64 requires all registers in data processing instructions to have
the same width (all X or all W).
Fix by validating that all three operand registers have matching widths
and emitting an error if they don't match.
workflow:
- revert 'pinact for security' for readability
from 831c3fa184
tccpp.c:
- remove code that allows tcc to parse numbers incorrectly (*)
from 829c848520
tccgen.c:
- Revert "Relaxed the 'incompatible pointer type' warning a bit" (*)
from d9ec17d334.
tccrun.c:
- remove support for -nostdlib -run
for simplicity, we require "main" with tcc -run always
tccpp.c:
- Revert "Free all preprocessor memmory in case of error."
from c96f0cad61
Remove TinyAlloc->limit instead. Thus it can do also bigger
allocs. Big TokenStrings (like 200kb+ when compiling tcc)
may come from inline functions or from large initializers.
Makefile/configure:
- use --config-pie for configuring tcc output only
- use -fPIC with clang-x86_64 to avoid 32-bit relocs
libtcc.c:
- fix "tcc file.c -run" i.e. -run as last argument
i386-gen.c:
- PIC refactor
(*) sorry, but code in tcc should have a minimum of generic relevance