Astreum

Language Specification

This page defines the Astreum language syntax, expression model, evaluation semantics, operator set, actor model, metering, and module system as implemented in the machine package and CLI module loader.

1. Lexical elements

The tokenizer produces a flat list of string tokens from source text.

  • Whitespace: Spaces, tabs, and newlines delimit tokens and are otherwise ignored.
  • Parentheses: ( and ) are standalone tokens that delimit list expressions.
  • Quote token: ' is emitted as a standalone token.
  • Integer literals: Decimal integers such as 1, 255, and -128 parse to Expr.Int.
  • Float literals: Values such as 3.14 and -2.5 parse to Expr.Float.
  • String literals: Double-quoted text such as "hello world" parses to Expr.String.
  • Hex bytes: 0x1f and 0Xab parse to Expr.Bytes.
  • Symbols: Any other contiguous non-whitespace, non-parenthesis string. Examples: sum, def, math.sum.
  • Line comments: ; starts a comment that runs to end-of-line.
  • Block comments: #; skips the next complete expression, including nested lists.

2. Expression data model

All runtime values are instances of Expr. The implementation exposes six variants:

Expr.Link

  • A pair of optional Expr references: head and tail.
  • Can also carry unresolved hash pointers (head_hash/tail_hash) for lazy DAG traversal.
  • Link(None, None) is the NIL sentinel.
  • Hash domain tag: \x00.
  • Serialization tag: 0x00.

Expr.Symbol

  • Wraps a UTF-8 string name such as "sum", "def", or "my.var".
  • Hash domain tag: \x01.
  • Serialization tag: 0x01.

Expr.Bytes

  • Wraps raw byte data such as b"\x01" or b"\x00\x80".
  • Used for byte literals and hex literals.
  • Hash domain tag: \x02.
  • Serialization tag: 0x02.

Expr.Int

  • Arbitrary-precision signed integer.
  • Hash domain tag: \x03.
  • Serialization tag: 0x03.

Expr.Float

  • IEEE 754 double-precision float.
  • Hash domain tag: \x04.
  • Serialization tag: 0x04.

Expr.String

  • UTF-8 string content.
  • Hash domain tag: \x05.
  • Serialization tag: 0x05.

Content-addressed hashing

  • Every Expr has a cached 32-byte Blake3 hash computed lazily.
  • Expr.to_bytes(expr) uses a tag byte followed by variant-specific payload bytes.
  • Expr.from_bytes(data) deserializes back to an Expr tree and raises ValueError on invalid data.

3. Parsing

  • tokenize(source: str) -> List[str] produces tokens.
  • parse(tokens: List[str]) -> (Expr, List[str]) consumes one expression and returns it with the remaining tokens.
  • ( opens a list. Items are parsed recursively until ). Empty list () produces Link(None, None).
  • Decimal integers parse to Expr.Int.
  • Float tokens parse to Expr.Float.
  • Double-quoted strings parse to Expr.String.
  • 0x or 0X prefixed hex tokens parse to Expr.Bytes.
  • All other tokens become Expr.Symbol.
  • ParseError is raised on unexpected end-of-input or unmatched ).

4. Evaluation model

evaluation(machine, expr, stack, env) -> List[Expr] is the core recursive evaluator.

Symbol dispatch

  • Operator: If the symbol is in OPERATOR_LIST, the handler is called. The operator pops arguments from the stack and pushes results.
  • Variable: Otherwise, the symbol is looked up via env.get(value). If bound, the value is pushed. If unbound, NIL is pushed.
  • Meter charges: bound lookups cost symbol_size + value_size; unbound lookups cost symbol_size + 1.

Atom evaluation

  • Bytes, Int, Float, and String values push themselves onto the stack.
  • Charges are size-based and depend on the concrete value.

Link evaluation

  • Quote: If the list head is quote or ', the tail is pushed unevaluated. (quote) with no tail pushes NIL.
  • Normal: Evaluate head, then evaluate tail recursively. This is how postfix dispatch works.

Result

  • Machine.run(expr, env) calls evaluation and returns the top of stack, or NIL if the stack is empty.

5. Operators

Stack notation below uses (before -> after).

5.1 Arithmetic

  • + - (b a -> sum). Int + Int returns Int. Float + Float returns Float. Mixed Int/Float promotes to Float.
  • - - (b a -> diff). Same type rules as +.
  • * - (b a -> product). Same type rules as +.
  • / - (b a -> quotient). Int / Int uses integer division. Float / Float uses float division. Mixed Int/Float promotes to Float.
  • % - (b a -> remainder). Int only.
  • sqrt - (a -> sqrt(a)). Float only.

Example: (1 2 +) -> 3. (1.5 2.5 +) -> 4.0.

5.2 Bitwise

  • & - (b a -> a & b). Bytes only.
  • | - (b a -> a | b). Bytes only.
  • ^ - (b a -> a ^ b). Bytes only.
  • ~ - (a -> ~a). Bytes only.

5.3 Shifts and rotates

  • << - logical left shift on Bytes.
  • >>> - logical right shift on Bytes.
  • >> - arithmetic right shift on Bytes.
  • rol - rotate left on Bytes.
  • ror - rotate right on Bytes.

5.4 Stack operations

  • dip - temporarily removes one value, evaluates the next expression, then restores the saved value.
  • drop - discard the top stack value.
  • dup - duplicate the top stack value.
  • swap - swap the top two stack values.

5.5 Expression construction

  • link - (head tail -> Link(head, tail)).
  • head - (link -> head). Pushes NIL if the head is missing.
  • tail - (link -> tail). Pushes NIL if the tail is missing.
  • is_atom - (expr -> 1|0). Returns 1 for non-Link values, including Bytes, Int, Float, String, and Symbol.
  • is_eq - (b a -> 1|0). Structural equality.
  • eval - (expr -> evaluated). Re-enters the evaluator on the value.
  • quote - (a -> (' a)). Stack operator that wraps a value in a quotation.
  • symbol - (a -> symbol|NIL). Converts Bytes, String, Int, or Float to Expr.Symbol.
  • str - (a -> string|NIL). Converts any atom to Expr.String.
  • float - (a -> float|NIL). Converts Int, Bytes (exactly 8 bytes), String, or Symbol to Expr.Float.
  • int - (a -> int|NIL). Converts Bytes, String, Symbol, or Float to Expr.Int.
  • bytes - (a -> bytes|NIL). Converts Int, Float, String, or Symbol to Expr.Bytes.
  • ref - (hash -> expr|NIL). Resolves a 32-byte hash to a stored expression.
  • load - (hash -> full_expr|NIL). Deep-resolves a 32-byte hash recursively.

5.6 Control flow

  • fn - pops a body and parameter list, then binds arguments in a child environment with lexical parentage.
  • lambda - same as fn but with no parent environment and no def_target.
  • if - (cond then else -> result). The condition is evaluated first. Truthiness is non-zero Bytes, non-zero Int, non-zero Float, or non-NIL Link.

5.7 Definition

  • def - (name value -> ). Stores value under name in env.def_target or env if def_target is None. Write-once per target environment.

Example: (10 x def) binds x to Int(10).

5.8 Actor model

  • spawn - (body name -> name|NIL). Spawns a new actor thread running body in a child environment. name must be a Symbol.
  • send - (target msg -> ). Sends msg to the mailbox of actor target.
  • receive - (target -> msg|NIL). Blocks until a message arrives in the mailbox of actor target.

6. Environment and scoping

  • Env(data, parent, def_target) stores local bindings, an optional lexical parent, and the environment that receives def writes.
  • get(key) checks local data first, then walks parent environments.
  • put(key, value) binds in the local environment.
  • def writes to env.def_target. fn sets def_target=global_env, so def inside a function writes globally while lookups still resolve lexically.
  • lambda creates an environment with no parent and no def target.

7. Machine and metering

Machine class

  • Machine(node, mode="dynamic", meter_enabled=True, meter_limit=None) orchestrates evaluation.
  • In "dynamic" mode, all operators execute normally.
  • In "deterministic" mode, spawn, send, receive, and eval push NIL instead of executing.
  • run(expr, env=None) evaluates an expression and returns the top of the stack or NIL.

Meter (gas)

  • Meter(enabled, limit) tracks byte-level computation cost.
  • charge_bytes(n) is a no-op when disabled. If the limit would be exceeded, it raises MeterExceededError.
  • Operator charges are size-based; arithmetic does not use width-squared charging in the current implementation.

8. Module system

The module system is implemented by the CLI tool, not the machine evaluator. It processes .aex files into environments at load time.

Module file structure

  • A module file is a sequence of top-level S-expressions, each parsed independently.
  • Each expression must be a 3-element form: (value name_or_prefix terminator).
  • The terminator must be def or import.

Definitions

  • (value name def) stores value under name with no runtime evaluation at load time.
  • Names are UTF-8 symbols. Example: (1 version def).

Path imports

  • (prefix "path/to/module.aex" import) or (prefix path/to/module.aex import) loads another module file.
  • Paths may be absolute or relative to the importing module's directory.
  • All definitions in the loaded module are prefixed with prefix..

Reference imports

  • (prefix (0x... ref) import) loads a module expression stored in atom storage by content hash.

Symbol rewriting

  • When a module is loaded under a prefix, all symbol references within its definitions are rewritten to fully qualified names. Example: sum -> math.sum.

Circular import protection

  • The loader maintains an active_stack set. If a module is encountered while already on the stack, ValueError is raised.