diff --git a/README.md b/README.md index d0d05db..6bdba8b 100644 --- a/README.md +++ b/README.md @@ -74,10 +74,123 @@ To validate this program, execute the compiler to build the data: cargo run --release -- -o main hello.rpg ``` -## Implementation +## Architecture -The RPG language was converted to an BNF, and fed into the bnf crate (https://docs.rs/bnf/latest/bnf/). +The compiler is split across two crates in a Cargo workspace: -The parse tree generated here is then given to LLVM, via Inkwell (https://crates.io/crates/inkwell) to create executable binaries. +| Crate | Role | +|-------|------| +| `rust-langrpg` | Compiler front-end, mid-end, and LLVM back-end | +| `rpgrt` | C-compatible runtime shared library (`librpgrt.so`) | -The binary can run standalone on a Linux system. Only the built-in function make hello world are implemented, and link to functions writen in Rust that are available as a shared library. +### Compilation pipeline + +``` +RPG IV source (.rpg) + │ + ▼ +┌─────────────────────────────────────────┐ +│ 1. BNF validation (bnf crate) │ +│ src/rpg.bnf — embedded at compile │ +│ time via include_str! │ +└────────────────┬────────────────────────┘ + │ parse tree (validation only) + ▼ +┌─────────────────────────────────────────┐ +│ 2. Lowering pass (src/lower.rs) │ +│ Hand-written recursive-descent │ +│ tokenizer + parser → typed AST │ +└────────────────┬────────────────────────┘ + │ ast::Program + ▼ +┌─────────────────────────────────────────┐ +│ 3. LLVM code generation (src/codegen.rs│ +│ inkwell bindings → LLVM IR module │ +└────────────────┬────────────────────────┘ + │ .o object file + ▼ +┌─────────────────────────────────────────┐ +│ 4. Linking (cc + librpgrt.so) │ +│ Produces a standalone Linux ELF │ +└─────────────────────────────────────────┘ +``` + +### Stage 1 — BNF validation (`src/rpg.bnf` + `bnf` crate) + +The RPG IV free-format grammar is encoded in BNF notation in `src/rpg.bnf` and embedded at compile time with `include_str!`. At startup the compiler parses the grammar with the [`bnf`](https://docs.rs/bnf/latest/bnf/) crate to build a `GrammarParser`. Each source file is validated against the top-level `` rule before any further processing. This stage acts as a gate: malformed source is rejected early with a clear parse error. + +### Stage 2 — Lowering to a typed AST (`src/lower.rs`) + +The BNF parser only validates structure; it does not produce a typed tree suitable for code generation. A hand-written tokenizer and recursive-descent parser in `lower.rs` converts the raw source text into the typed `Program` AST defined in `src/ast.rs`. + +The AST covers the full language surface that the compiler handles: + +- **Declarations** — `CTL-OPT`, `DCL-S`, `DCL-C`, `DCL-DS`, `DCL-F`, subroutines +- **Procedures** — `DCL-PROC … END-PROC` with `DCL-PI … END-PI` parameter interfaces +- **Statements** — assignment, `IF/ELSEIF/ELSE`, `DOW`, `DOU`, `FOR`, `SELECT/WHEN`, `MONITOR/ON-ERROR`, `CALLP`, `DSPLY`, `RETURN`, `LEAVE`, `ITER`, `LEAVESR`, `EXSR`, `CLEAR`, `RESET`, all I/O opcodes +- **Expressions** — literals, variables, qualified names (`ds.field`), arithmetic, logical operators, comparisons, built-in functions (`%LEN`, `%TRIM`, `%SUBST`, `%SCAN`, `%EOF`, `%SIZE`, `%ADDR`, `%SQRT`, `%ABS`, `%REM`, `%DIV`, and more) +- **Types** — `CHAR`, `VARCHAR`, `INT`, `UNS`, `FLOAT`, `PACKED`, `ZONED`, `BINDEC`, `IND`, `DATE`, `TIME`, `TIMESTAMP`, `POINTER`, `LIKE`, `LIKEDS` + +Unrecognised constructs produce `Statement::Unimplemented` or placeholder declaration variants rather than hard errors, so the compiler continues to lower the parts it understands. + +### Stage 3 — LLVM code generation (`src/codegen.rs`) + +The typed `Program` is handed to the code generator, which uses [`inkwell`](https://crates.io/crates/inkwell) (safe Rust bindings to LLVM 21) to build an LLVM IR module: + +- Each `DCL-PROC … END-PROC` becomes an LLVM function. +- An exported procedure named `main` (or the first exported procedure) is wrapped in a C `main()` entry point so the resulting binary is directly executable. +- `DCL-S` standalone variables are allocated as `alloca` stack slots inside their owning function, or as LLVM global variables for module-scope declarations. +- String literals are stored as null-terminated byte arrays in `.rodata`. +- `DSPLY expr;` is lowered to a call to `rpg_dsply(ptr, len)` (or `rpg_dsply_i64` / `rpg_dsply_f64` for numeric types) provided by the runtime library. +- Control-flow constructs (`IF`, `DOW`, `DOU`, `FOR`, `SELECT`) are lowered to LLVM basic blocks and conditional / unconditional branches. +- `LEAVE` / `ITER` are lowered to `br` to the loop-exit / loop-header block respectively, tracked via a `FnState` per function. + +The module is then compiled to a native `.o` object file for the host target via LLVM's target machine API, with optional optimisation passes (`-O0` through `-O3`). + +### Stage 4 — Linking + +The object file is linked into a standalone ELF executable by invoking the system C compiler (`cc`). The executable is linked against `librpgrt.so`. + +### Runtime library (`rpgrt/`) + +`rpgrt` is a separate Cargo crate built as a `cdylib`, producing `librpgrt.so`. It is written in Rust and exports a C ABI used by compiled RPG programs: + +| Symbol | Signature | Purpose | +|--------|-----------|---------| +| `rpg_dsply` | `(ptr: *const u8, len: i64)` | Display a fixed-length `CHAR` field (trims trailing spaces) | +| `rpg_dsply_cstr` | `(ptr: *const c_char)` | Display a null-terminated C string | +| `rpg_dsply_i64` | `(n: i64)` | Display a signed 64-bit integer | +| `rpg_dsply_f64` | `(f: f64)` | Display a double-precision float | +| `rpg_halt` | `(code: i32)` | Abnormal program termination | +| `rpg_memset_char` | `(ptr, len, fill)` | Fill a char buffer with a repeated byte | +| `rpg_move_char` | `(dst, dst_len, src, src_len)` | Copy between fixed-length char fields (pad / truncate) | +| `rpg_trim` | `(dst, src, src_len) -> i64` | Trim leading and trailing spaces, return trimmed length | +| `rpg_len` | `(len: i64) -> i64` | Identity — returns the static `%LEN` of a field | +| `rpg_scan` | `(needle, n_len, haystack, h_len, start) -> i64` | `%SCAN` substring search | +| `rpg_subst` | `(src, src_len, start, length, dst, dst_len)` | `%SUBST` extraction | + +`DSPLY` output is written to **stdout** and flushed immediately, mirroring IBM i's interactive operator message queue format: + +```/dev/null/example.txt#L1 +DSPLY Hello, World! +``` + +### Project layout + +``` +rust-langrpg/ +├── src/ +│ ├── rpg.bnf — RPG IV free-format BNF grammar (embedded at compile time) +│ ├── lib.rs — Grammar loader and demo helpers +│ ├── ast.rs — Typed AST node definitions +│ ├── lower.rs — Tokenizer + recursive-descent lowering pass +│ ├── codegen.rs — LLVM IR code generation (inkwell) +│ ├── main.rs — Compiler CLI (clap) + linker invocation +│ └── bin/ +│ └── demo.rs — Grammar demo binary +├── rpgrt/ +│ └── src/ +│ └── lib.rs — Runtime library (librpgrt.so) +├── hello.rpg — Hello World example program +└── count.rpg — Counting loop example program +```