Files
langrpg/README.md
2026-03-12 21:57:25 -07:00

9.4 KiB

Rust RPG Lang

An implementation of the RPG language from IBM.

Language reference: https://www.ibm.com/docs/en/i/7.5.0?topic=introduction-overview-rpg-iv-programming-language

Usage

Building

cargo build --release

Running

The compiler ships as a standalone binary that loads the embedded BNF grammar, builds a parser, and runs a suite of RPG IV snippet examples to demonstrate the grammar in action:

cargo run --bin demo

You will see output similar to:

=== RPG IV Free-Format Parser ===

[grammar] Loaded successfully.
[parser]  Built successfully (all non-terminals resolved).

=== Parsing Examples ===

  ┌─ simple identifier (identifier) ─────────────────────
  │  source : "myVar"
  │  result : OK
  └──────────────────────────────────────────────
...
=== Summary ===
  total   : 42
  matched : 42
  failed  : 0

All examples parsed successfully.

Hello World in RPG IV

The following is a complete Hello World program written in RPG IV free-format syntax, as understood by this parser:

hello.rpg:

CTL-OPT DFTACTGRP(*NO);

DCL-S greeting CHAR(25) INZ('Hello, World!');

DCL-PROC main EXPORT;
  DSPLY greeting;
  RETURN;
END-PROC;

Breaking it down:

  • CTL-OPT DFTACTGRP(*NO); — control option spec declaring the program does not run in the default activation group
  • DCL-S greeting CHAR(25) INZ('Hello, World!'); — standalone variable declaration: a 25-character field initialised to 'Hello, World!'
  • DCL-PROC main EXPORT; ... END-PROC; — a procedure named main, exported so it can be called as a program entry point
  • DSPLY greeting; — displays the value of greeting to the operator message queue
  • RETURN; — returns from the procedure

To validate this program, execute the compiler to build the data:

cargo run --release -- -o main hello.rpg

Architecture

The compiler is split across two crates in a Cargo workspace:

Crate Role
rust-langrpg Compiler front-end, mid-end, and LLVM back-end
rpgrt C-compatible runtime shared library (librpgrt.so)

Compilation pipeline

RPG IV source (.rpg)
       │
       ▼
┌─────────────────────────────────────────┐
│  1. BNF validation  (bnf crate)         │
│     src/rpg.bnf  — embedded at compile  │
│     time via include_str!               │
└────────────────┬────────────────────────┘
                 │  parse tree (validation only)
                 ▼
┌─────────────────────────────────────────┐
│  2. Lowering pass  (src/lower.rs)       │
│     Hand-written recursive-descent      │
│     tokenizer + parser → typed AST      │
└────────────────┬────────────────────────┘
                 │  ast::Program
                 ▼
┌─────────────────────────────────────────┐
│  3. LLVM code generation (src/codegen.rs│
│     inkwell bindings → LLVM IR module   │
└────────────────┬────────────────────────┘
                 │  .o object file
                 ▼
┌─────────────────────────────────────────┐
│  4. Linking  (cc + librpgrt.so)         │
│     Produces a standalone Linux ELF     │
└─────────────────────────────────────────┘

Stage 1 — BNF validation (src/rpg.bnf + bnf crate)

The RPG IV free-format grammar is encoded in BNF notation in src/rpg.bnf and embedded at compile time with include_str!. At startup the compiler parses the grammar with the bnf crate to build a GrammarParser. Each source file is validated against the top-level <program> rule before any further processing. This stage acts as a gate: malformed source is rejected early with a clear parse error.

Stage 2 — Lowering to a typed AST (src/lower.rs)

The BNF parser only validates structure; it does not produce a typed tree suitable for code generation. A hand-written tokenizer and recursive-descent parser in lower.rs converts the raw source text into the typed Program AST defined in src/ast.rs.

The AST covers the full language surface that the compiler handles:

  • DeclarationsCTL-OPT, DCL-S, DCL-C, DCL-DS, DCL-F, subroutines
  • ProceduresDCL-PROC … END-PROC with DCL-PI … END-PI parameter interfaces
  • Statements — assignment, IF/ELSEIF/ELSE, DOW, DOU, FOR, SELECT/WHEN, MONITOR/ON-ERROR, CALLP, DSPLY, RETURN, LEAVE, ITER, LEAVESR, EXSR, CLEAR, RESET, all I/O opcodes
  • Expressions — literals, variables, qualified names (ds.field), arithmetic, logical operators, comparisons, built-in functions (%LEN, %TRIM, %SUBST, %SCAN, %EOF, %SIZE, %ADDR, %SQRT, %ABS, %REM, %DIV, and more)
  • TypesCHAR, VARCHAR, INT, UNS, FLOAT, PACKED, ZONED, BINDEC, IND, DATE, TIME, TIMESTAMP, POINTER, LIKE, LIKEDS

Unrecognised constructs produce Statement::Unimplemented or placeholder declaration variants rather than hard errors, so the compiler continues to lower the parts it understands.

Stage 3 — LLVM code generation (src/codegen.rs)

The typed Program is handed to the code generator, which uses inkwell (safe Rust bindings to LLVM 21) to build an LLVM IR module:

  • Each DCL-PROC … END-PROC becomes an LLVM function.
  • An exported procedure named main (or the first exported procedure) is wrapped in a C main() entry point so the resulting binary is directly executable.
  • DCL-S standalone variables are allocated as alloca stack slots inside their owning function, or as LLVM global variables for module-scope declarations.
  • String literals are stored as null-terminated byte arrays in .rodata.
  • DSPLY expr; is lowered to a call to rpg_dsply(ptr, len) (or rpg_dsply_i64 / rpg_dsply_f64 for numeric types) provided by the runtime library.
  • Control-flow constructs (IF, DOW, DOU, FOR, SELECT) are lowered to LLVM basic blocks and conditional / unconditional branches.
  • LEAVE / ITER are lowered to br to the loop-exit / loop-header block respectively, tracked via a FnState per function.

The module is then compiled to a native .o object file for the host target via LLVM's target machine API, with optional optimisation passes (-O0 through -O3).

Stage 4 — Linking

The object file is linked into a standalone ELF executable by invoking the system C compiler (cc). The executable is linked against librpgrt.so.

Runtime library (rpgrt/)

rpgrt is a separate Cargo crate built as a cdylib, producing librpgrt.so. It is written in Rust and exports a C ABI used by compiled RPG programs:

Symbol Signature Purpose
rpg_dsply (ptr: *const u8, len: i64) Display a fixed-length CHAR field (trims trailing spaces)
rpg_dsply_cstr (ptr: *const c_char) Display a null-terminated C string
rpg_dsply_i64 (n: i64) Display a signed 64-bit integer
rpg_dsply_f64 (f: f64) Display a double-precision float
rpg_halt (code: i32) Abnormal program termination
rpg_memset_char (ptr, len, fill) Fill a char buffer with a repeated byte
rpg_move_char (dst, dst_len, src, src_len) Copy between fixed-length char fields (pad / truncate)
rpg_trim (dst, src, src_len) -> i64 Trim leading and trailing spaces, return trimmed length
rpg_len (len: i64) -> i64 Identity — returns the static %LEN of a field
rpg_scan (needle, n_len, haystack, h_len, start) -> i64 %SCAN substring search
rpg_subst (src, src_len, start, length, dst, dst_len) %SUBST extraction

DSPLY output is written to stdout and flushed immediately, mirroring IBM i's interactive operator message queue format:

DSPLY  Hello, World!

Project layout

rust-langrpg/
├── src/
│   ├── rpg.bnf       — RPG IV free-format BNF grammar (embedded at compile time)
│   ├── lib.rs        — Grammar loader and demo helpers
│   ├── ast.rs        — Typed AST node definitions
│   ├── lower.rs      — Tokenizer + recursive-descent lowering pass
│   ├── codegen.rs    — LLVM IR code generation (inkwell)
│   ├── main.rs       — Compiler CLI (clap) + linker invocation
│   └── bin/
│       └── demo.rs   — Grammar demo binary
├── rpgrt/
│   └── src/
│       └── lib.rs    — Runtime library (librpgrt.so)
├── hello.rpg         — Hello World example program
└── count.rpg         — Counting loop example program