Jekyll2022-09-23T04:01:28+09:00https://woodrush.github.io/blog/feed.xmlWoodrush’s BlogA blog written by Hikaru Ikuta.LambdaLisp - A Lisp Interpreter That Runs on Lambda Calculus2022-09-17T23:00:00+09:002022-09-17T23:00:00+09:00https://woodrush.github.io/blog/lambdalisp LambdaLisp is a Lisp interpreter written as an untyped lambda calculus term. The input and output text is encoded into closed lambda terms using the Mogensen-Scott encoding, so the entire computation process solely consists of the beta-reduction of lambda calculus terms.

When run on a lambda calculus interpreter that runs on the terminal, it presents a REPL where you can interactively define and evaluate Lisp expressions. Supported interpreters are:

Supported features are:

• Signed 32-bit integers
• Strings
• Closures, lexical scopes, and persistent bindings with let
• Object-oriented programming feature with class inheritance
• Reader macros with set-macro-character
• Access to the interpreter’s virtual heap memory with malloc, memread, and memwrite
• Show the call stack trace when an error is invoked
• Garbage collection during macro evaluation

and much more.

LambdaLisp can be used to write interactive programs. Execution speed is fast as well, and the number guessing game runs on the terminal with an almost instantaneous response speed.

Here is a PDF showing its entire lambda term, which is 42 pages long:

The embedded PDF may not show on mobile. The same PDF is also available at https://woodrush.github.io/lambdalisp.pdf.

Page 32 is particulary interesting, consisting entirely of (s: ## Overview

LambdaLisp is a Lisp interpreter written as a closed untyped lambda calculus term. It is written as a lambda calculus term ${\rm LambdaLisp} = \lambda x. \cdots$ which takes a string $x$ as an input and returns a string as an output. The input $x$ is the Lisp program and the user’s standard input, and the output is the standard output. Characters are encoded into lambda term representations of natural numbers using the Church encoding, and strings are encoded as a list of characters with lists expressed as lambdas in the Mogensen-Scott encoding, so the entire computation process solely consists of the beta-reduction of lambda terms, without introducing any non-lambda-type object. In this sense, LambdaLisp operates on a “truly purely functional language” without any primitive data types except for lambda terms.

Supported features are closures and persistent bindings with let, reader macros, 32-bit signed integers, a built-in object-oriented programming feature based on closures, and much more. LambdaLisp is tested by running code on both Common Lisp and LambdaLisp and comparing their outputs. The largest LambdaLisp-Common-Lisp polyglot program that has been tested is lambdacraft.cl, which runs the Lisp-to-lambda-calculus compiler LambdaCraft I wrote for this project, also used to compile LambdaLisp itself.

When run on a lambda calculus interpreter that runs on the terminal, LambdaLisp presents a REPL where you can interactively define and evaluate Lisp expressions. These interpreters automatically process the string-to-lambda encoding for handling I/O through the terminal.

Lisp has been described by Alan Kay as the Maxwell’s equations of software. In the same sense, I believe that lambda calculus is the particle physics of computation. LambdaLisp may therefore be a gigantic electromagnetic Lagrangian that connects the realm of human-friendly programming to the origins of the notion of computation itself.

## Usage

LambdaLisp is available on GitHub at https://github.com/woodrush/lambdalisp. Here we will explain how to try LambdaLisp right away on x86-64-Linux and other platforms such as Mac.

### Running the LambdaLisp REPL (on x86-64-Linux)

You can try the LambdaLisp REPL on x86-64-Linux by simply running:

git clone https://github.com/woodrush/lambdalisp
cd lambdalisp
make run-repl


The requirement is cc which should be installed by default. To try it on a Mac, please see the next section.

This will run LambdaLisp on SectorLambda, the 521-byte lambda calculus interpreter. The source code being run is lambdalisp.blc, which is the lambda calculus term shown in lambdalisp.pdf written in binary lambda calculus notation.

SectorLambda automatically takes care of the string-to-lambda I/O encoding to run LambdaLisp on the terminal. Interaction is done by writing LambdaLisp in continuation-passing style, allowing a Haskell-style interactive I/O to work on lambda calculus interpreters.

When building SectorLambda, Make runs the following commands to get its source codes:

• Blc.S: wget https://justine.lol/lambda/Blc.S?v=2
• flat.lds: wget https://justine.lol/lambda/flat.lds

After running make run-repl, the REPL can also be run as:

( cat ./bin/lambdalisp.blc | ./bin/asc2bin; cat ) | ./bin/Blc


### Running the LambdaLisp REPL (on Other Platforms)

SectorLambda is x86-64-Linux exclusive. On other platforms such as a Mac, the following command can be used:

git clone https://github.com/woodrush/lambdalisp
cd lambdalisp
make run-repl-ulamb


This runs LambdaLisp on the lambda calculus interpreter clamb. The requirement for this is gcc or cc.

After running make run-repl-ulamb, the REPL can also be run as:

( cat ./bin/lambdalisp.ulamb | ./bin/asc2bin; cat ) | ./bin/clamb -u


LambdaLisp supports other various lambda calculus interpreters as well. For instructions for other interpreters, please see the GitHub repo.

### Playing the Number Guessing Game

Once make run-repl is run, you can play the number guessing game with:

( cat ./bin/lambdalisp.blc | ./bin/asc2bin; cat ./examples/number-guessing-game.cl; cat ) | ./bin/Blc


If you ran make run-repl-ulamb, you can run:

( cat ./bin/lambdalisp.ulamb | ./bin/asc2bin; cat ./examples/number-guessing-game.cl; cat ) | ./bin/clamb -u


You can run the same script on Common Lisp. If you use SBCL, you can run it with:

sbcl --script ./examples/number-guessing-game.cl


## Examples

### Closures

The following LambdaLisp code runs right out of the box:

(defun new-counter (init)
;; Return a closure.
;; Use the let over lambda technique for creating independent and persistent variables.
(let ((i init))
(lambda () (setq i (+ 1 i)))))

;; Instantiate counters
(setq counter1 (new-counter 0))
(setq counter2 (new-counter 10))

(print (counter1)) ;; => 1
(print (counter1)) ;; => 2
(print (counter2)) ;; => 11
(print (counter1)) ;; => 3
(print (counter2)) ;; => 12
(print (counter1)) ;; => 4
(print (counter1)) ;; => 5


An equivalent JavaScript code is:

// Runs on the browser's console
function new_counter (init) {
let i = init;
return function () {
return ++i;
}
}

var counter1 = new_counter(0);
var counter2 = new_counter(10);

console.log(counter1()); // => 1
console.log(counter1()); // => 2
console.log(counter2()); // => 11
console.log(counter1()); // => 3
console.log(counter2()); // => 12
console.log(counter1()); // => 4
console.log(counter1()); // => 5


### Object-Oriented Programming With Class Inheritance

As described in Let Over Lambda, when you have closures, you get object-oriented programming for free. LambdaLisp has a built-in OOP feature implemented as predefined macros based on closures. It supports Python-like classes with class inheritance:

;; Runs on LambdaLisp
(defclass Counter ()
(i 0)

(defmethod inc ()
(setf (. self i) (+ 1 (. self i))))

(defmethod dec ()
(setf (. self i) (- (. self i) 1))))

(defmethod *init (i)
(setf (. self i) i))

(setf (. self i) (+ (. self i) n))))

(defparameter counter1 (new Counter))

((. counter1 inc))

(setf (. counter1 i) 5)
(setf (. counter2 i) 500)


An equivalent Python code is:

class Counter ():
i = 0

def inc (self):
self.i += 1
return self.i

def dec (self):
self.i -= 1
return self.i

def __init__ (self, i):
self.i = i

self.i += n
return self.i

counter1 = Counter()

counter1.inc()

counter1.i = 5
counter2.i = 500


### More Examples

More examples can be found in the GitHub repo. The largest LambdaLisp program currently written is lambdacraft.cl, which runs the lambda calculus compiler LambdaCraft I wrote for this project, also used to compile LambdaLisp itself.

## Features

Key features are:

• Signed 32-bit integers
• Strings
• Closures, lexical scopes, and persistent bindings with let
• Object-oriented programming feature with class inheritance
• Reader macros with set-macro-character
• Access to the interpreter’s virtual heap memory with malloc, memread, and memwrite
• Show the call stack trace when an error is invoked
• Garbage collection during macro evaluation

Supported special forms and functions are:

• defun, defmacro, lambda (&rest can be used)
• quote, atom, car, cdr, cons, eq
• +, -, *, /, mod, =, >, <, >=, <=, integerp
• read (reads Lisp expressions), print, format (supports ~a and ~%), write-to-string, intern, stringp
• let, let*, labels, setq, boundp
• progn, loop, block, return, return-from, if, cond, error
• list, append, reverse, length, position, mapcar
• make-hash-table, gethash (setf can be used)
• equal, and, or, not
• eval, apply
• set-macro-character, peek-char, read-char,  , ,@ ' #\
• carstr, cdrstr, str, string comparison with =, >, <, >=, <=, string concatenation with +
• defun-local, defglobal, type, macro
• new, defclass, defmethod, ., field assignment by setf

## Tests

There are 2 types of tests written for LambdaLisp. The GitHub Actions CI runs these tests.

### Output Comparison Test

The files examples/*.cl run both on Common Lisp and LambdaLisp producing identical results, except for the initial >    printed by the REPL in LambdaLisp. This test first runs *.cl on both SBCL and LambdaLisp and compares their outputs.

The files examples/*.lisp are LambdaLisp-exclusive programs. The output of these files are compared with test/*.lisp.out.

### LambdaCraft Compiler Hosting Test

examples/lambdacraft.cl runs LambdaCraft, a Common-Lisp-to-lambda-calculus compiler written in Common Lisp, used to compile the lambda calculus source for LambdaLisp. The script defines a binary lambda calculus (BLC) program that prints the letter A and exits, and prints the BLC source code for the defined program.

The LambdaCraft compiler hosting test first executes examples/lambdacraft.cl on LambdaLisp, then runs the output BLC program on a BLC interpreter, and checks if it prints the letter A and exits.

### Experimental: Self-Hosting Test

This test is currently theoretical since it requires a lot of time and memory, and is unused in make test-all. This test extends the previous LambdaCraft compiler hosting test and checks if the Common Lisp source code for LambdaLisp runs on LambdaLisp itself. Since the LambdaCraft compiler hosting test runs properly, this test should theoretically run as well, although it requires a tremendous amount of memory and time. The test is run on the binary lambda calculus interpreter Blc.

One concern is whether the 32-bit heap address space used internally in LambdaLisp is enough to compile this program. This can be solved by compiling LambdaLisp with an address space of 64-bit or larger, which can be done simply by replacing the literal 32 (which only appears once in src/lambdalisp.cl) with 64, etc. Another concern is whether if the execution hits Blc’s maximum term limit. This can be solved by compiling Blc with a larger memory limit, by editing the rule for \$(BLC) in the Makefile.

## General Lambda Calculus Programming

Before introducing LambdaLisp-specific implementation details, we’ll cover some general topics about programming in lambda calculus.

### Handling I/O in Lambda Calculus

LambdaLisp is written as a function ${\rm LambdaLisp} = \lambda x. \cdots$ which takes one string as an input and returns one string as an output. The input represents the Lisp program and the user’s standard input (the $x$ is the input string), and the output represents the standard output. A string is represented as a list of bits of its ASCII representation. In untyped lambda calculus, a method called the Mogensen-Scott encoding can be used to express a list of lambda terms as a pure untyped lambda calculus term, without the help of introducing a non-lambda-type object.

Bits are encoded as:

\begin{aligned} 0 &= \lambda x. \lambda y. x \\ 1 &= \lambda x. \lambda y. y \\ \end{aligned}

Lists are made using the list constructor and terminator ${\rm cons}$ and ${\rm nil}$:

\begin{aligned} {\rm cons} &= \lambda x. \lambda y. \lambda f. (f x y) \\ {\rm nil} &= \lambda x. \lambda y. y \end{aligned}

Note that $1 = {\rm nil}$. Under these rules, the bit sequence 0101 can be expressed as a composition of lambda terms:

$({\rm cons} ~ 0 ~ ({\rm cons} ~ 1 ~ ({\rm cons} ~ 0 ~ ({\rm cons} ~ 1 ~ {\rm nil}))))$

which is exactly the same as how lists are constructed in Lisp. This beta-reduces to:

\begin{aligned} \lambda f. &(f (\lambda x.\lambda y.x) \lambda g.(g (\lambda x.\lambda y.y) \\ &\lambda h.(h (\lambda x.\lambda y.x) \lambda i.(i (\lambda x.\lambda y.y) (\lambda x.\lambda y.y))))) \end{aligned}

Using this method, both the standard input and output strings can entirely be encoded into pure lambda calculus terms, letting LambdaLisp to operate with beta reduction of lambda terms as its sole rule of computation, without the requirement of introducing any non-lambda-type object.

The LambdaLisp execution flow is thus as follows: you first encode the input string (Lisp program and stdin) as lambda terms, apply it to ${\rm LambdaLisp} = \lambda x. \cdots$, beta-reduce it until it is in beta normal form, and parse the output lambda term as a Mogensen-Scott-encoded list of bits (inspecting the equivalence of lambda terms is quite simple in this case since it is in beta normal form). This rather complex flow is supported exactly as is in 3 lambda-calculus-based programming languages: Binary Lambda Calculus, Universal Lambda, and Lazy K.

### Lambda-Calculus-Based Programming Languages

Binary Lambda Calculus (BLC) and Universal Lambda (UL) are programming languages with the exact same I/O strategy described above - a program is expressed as one pure lambda term that takes a Church-Mogensen-Scott-encoded string and returns a Church-Mogensen-Scott-encoded string. When the interpreters for these languages Blc and clamb are run on the terminal, the interpreter automatically encodes the input bytestream to lambda terms, performs beta-reduction, parses the output lambda term as a list of bits, and prints the output as a string in the terminal.

The differences in BLC and UL are in a slight difference in the method for encoding the I/O. Otherwise, both of these languages follow the same principle, where lambda terms are the solely available object types in the language.

In BLC and UL, lambda terms are written in a notation called binary lambda calculus. Details on the BLC notation is described in the Appendix.

### Lazy K

Lazy K is a language with the same I/O strategy with BLC and UL except programs are written as SKI combinator calculus terms instead of lambda terms. The SKI combinator calculus is a system equivalent to lambda calculus, where there are only 3 functions:

\begin{aligned} S &= \lambda x.\lambda y.\lambda z.(x z (y z)) \\ K &= \lambda x.\lambda y.x \\ I &= \lambda x.x \end{aligned}

Every SKI combinator calculus term is written as a combination of these 3 functions.

Every SKI term can be easily be converted to an equivalent lambda calculus term by simply rewriting the term with these rules. Very interestingly, there is a method for converting the other way around - there is a consistent method to convert an arbitrary lambda term with an arbitrary number of variables to an equivalent SKI combinator calculus term. This equivalence relation with lambda calculus proves that SKI combinator calculus is Turing-complete.

Apart from the strong condition that only 3 predefined functions exist, the beta-reduction rules for the SKI combinator calculus are exactly identical as that of lambda calculus, so the computation flow and the I/O strategy for Lazy K is the same as BLC and Universal Lambda - all programs can be written purely in SKI combinator calculus terms without the need of introducing any function other than S, K, and I. This allows Lazy K’s syntax to be astonishingly simple, where only 4 keywords exist - s, k, i, and  for function application.

As mentioned in the original Lazy K design proposal, if BF captures the distilled essence of structured imperative programming, Lazy K captures the distilled essence of functional programming. It might as well be the assembly language of lazily evaluated functional programming. With the simple syntax and rules orchestrating a Turing-complete language, I find Lazy K to be a very beautiful language being one of my all-time favorites.

LambdaLisp is written in these 3 languages - Binary Lambda Calculus, Universal Lambda, and Lazy K. In each of these languages, LambdaLisp is expressed as one lambda term or SKI combinator calculus term. Therefore, to run LambdaLisp, an interpreter for one of these languages is required. To put in a rather convoluted way, LambdaLisp is a Lisp interpreter that runs on another language’s interpreter.

### Interactive I/O in Lambda-Calculus-Based Languages

The I/O model described previously looks static - at first sight it seems as if the entire value of ${\rm stdin}$ needs to be pre-supplied beforehand on execution, making interactive programs be impossible. However, this is not the case. The interpreter handles interactive I/O by the following clever combination of input buffering and lazy evaluation:

• The input string is buffered into the memory. Its values are lazily evaluated - execution proceeds until the last moment an input needs to be referenced in order for beta reduction to proceed.
• The output string is printed as soon as the interpreter deduces the first characters of the output string.

As an example, consider the BLC program ${\rm ROT13}$ which initially prints a prompt In>, accepts standard input, then outputs the ROT13 encoding of the standard input. After the user starts the program, the interpreter’s beta-reduction proceeds as follows:

\begin{aligned} {\rm ROT13} ~ {\rm stdin} &= (\lambda s.{\rm Code} ~ s) ~ {\rm stdin} \\ &= ({\rm Code} ~ {\rm stdin}) \\ &= ({\rm cons} ~ \tilde I ~ ({\rm Code}_1 ~ {\rm stdin})) \\ &= ({\rm cons} ~ \tilde I ~ ({\rm cons} ~ \tilde n ~ ({\rm Code}_2 ~ {\rm stdin}))) \\ &= ({\rm cons} ~ \tilde I ~ ({\rm cons} ~ \tilde n ~ ({\rm cons} ~ \tilde > ~ ({\rm Code}_3 ~ {\rm stdin})))) \\ \end{aligned}

Here, $\tilde I$ is a lambda term representing the character I.

As ${\rm Code}$ is beta-reduced, it “weaves out” its output string In> on the way. This is somewhat akin to the central dogma of molecular biology, where ribosomes transcribe genetic information to polypeptide chains - the program is the gene, the interpreter is the ribosome, and the list of output characters is the polypeptide chain.

The interpreter continues its evaluation until further beta reduction is impossible without the knowledge of the value of ${\rm stdin}$, which happens at ${\rm Code}_3$. At this point, the string In> is shown on the terminal since its values are already determined and available.

Seeing the prompt In>, suppose that the user types the string “Hi”. The interpreter then buffers its lambda-encoded expression into the pointer that points to ${\rm stdin}$, making the evaluation proceed as:

\begin{aligned} r.h.s. &= ({\rm cons} ~ \tilde I ~ ({\rm cons} ~ \tilde n ~ ({\rm cons} ~ \tilde > ~ ({\rm cons} ~ \tilde U ~ ({\rm Code}_4 ~ {\rm stdin}))))) \\ &= ({\rm cons} ~ \tilde I ~ ({\rm cons} ~ \tilde n ~ ({\rm cons} ~ \tilde > ~ ({\rm cons} ~ \tilde U ~ ({\rm cons} ~ \tilde v ~ ({\rm Code}_5 ~ {\rm stdin})))))) \\ \end{aligned}

On the terminal, the interaction process would look like this:

In>[**Halts; User types string**]Hi[**User types return; Hi is buffered into stdin**]
Uv


## LambdaLisp Implementation Details

We will now cover LambdaLisp-specific implementation details.

### The Core Implementation Strategy

When viewed as a programming language, lambda calculus is a purely functional language. This derives the following two basic programming strategies for LambdaLisp:

• In order to use global variables, the global state is passed to every function that affects or relies on the global state.
• In order to control the evaluation flow for I/O, continuation-passing style (CPS) is used.

Writing in continuation-passing style also helps the lambda calculus interpreter to prevent re-evaluating the same term multiple times. This is very, very important and critical when writing programs for the 521-byte binary lambda calculus interpreters Blc, tromp and uni, since it seems that they lack a memoization feature, although they readily have a garbage collection feature. Writing in direct style gradually slows down the runtime execution speed since memoization does not occur and the same computation is repeated multiple times. However, by carefully writing the entire program in continuation-passing style, the evaluation flow can be controlled so that the program only proceeds when the value under attention is in beta-normal form. In this situation, since values are already evaluated to their final form, the need for memoization becomes unnecessary in the first place.

The continuation-passing style technique suddenly transforms Blc to a very, very fast and efficient lambda calculus interpreter. In fact, the largest program lambdacraft.cl runs the fastest and even the most memory-efficient on Blc, using only about 1.7 GB of memory, while clamb uses about 5GB of memory. I suspect that the speed is due to a efficient memory configuration and the memory efficiency is due to the garbage collection feature. I was very surprised how large of a galaxy is fit into the microcosmos of 521 bytes! This realization that continuation-passing-style programs run very fast on Blc was what made everything possible and what motivated me to set on the journey of building LambdaCraft and LambdaLisp.

The difference between continuation-passing style and direct style is explained in the Appendix, with JavaScript code samples that run on the browser’s console.

### The Startup Phase - A Primitive Form of Asynchronous Programming

Right after LambdaLisp starts running, it runs the following steps:

• The string generator decompresses the “prelude Lisp code”, and also generates keyword strings such as quote.
• The initial prompt carret >    is printed.
• The prelude Lisp code is silently evaluated (the results are not printed).
• The user’s initial input is evaluated and the results are printed.
• The interpreter enters the basic read-eval-print-loop.

The prelude Lisp code is a LambdaLisp code that defines core functions such as defun and defmacro as macros. The prelude is hard-coded as a compressed string, which is decompressed by the string generator before being passed to the interpreter. The compression method is described later. The LambdaLisp core code written in LambdaCraft hard-codes only the essential and speed-critical functions. Other non-critical functions are defined through LambdaLisp code in the prelude. A lot of the features including defun and defmacro which seem to look like keywords are in fact defined as macros. Due to this implementation, it is in fact possible to override the definition of defun to something else using setq, but I consider that as a flexible feature.

There is in fact a subtle form of asynchronous programming in action in this startup phase. Since the prelude code takes a little bit of time to evaluate since it contains a lot of code, if the initial prompt carret >  is shown after evaluating the prelude code, it causes a slightly stressful short lag until the prelude finishes running.

To circumvent this, the the initial prompt carret >  is printed before the prelude is loaded. This allows the prelude code to be evaluated while the user is typing their initial Lisp code. Since the input is buffered in all of the lambda calculus interpreters, their input does not get lost even when the prelude is running in the background. If the user types very fast, there will become a little bit of waiting until the result of the inital input is shown, but if the prelude is already loaded, their result will be shown right away. All later inputs are evaluated faster because the prelude is only read once at initialization. In a way, this is a primitive form of asynchronous programming, where processing the user input and the execution of some code is done concurrently.

### The Basic Evaluation Loop

After the prelude is loaded, the interpreter enters its basic read-eval-print loop.

As mentioned before, the basic design is that all state-affecting functions must pass the state as arguments, and basicallly all functions are written in CPS. This makes the core eval function have the following signature:

;; LambdaCraft
(defrec-lazy eval (expr state cont)
...)


Where expr is a Lisp expression, state is the global state, and cont is the continuation. The comment ;; LambdaCraft indicates that this is the source code for the LambdaLisp interpreter written in LambdaCraft.

The direct return type of eval is string, and not expr or state. This is because the entire program is expected to be a program that “takes a string and outputs a string”. This design also allows print debugging to be written very intuitively in an imperative style as discussed later.

Instead of using the direct return values, the evaluation results are “returned” to later processes through the continuation cont. cont is actually just a callback function that is called at the end of eval, which is called with the evaluated expr and the new state as its arguments. For example, if the evaluation result is 42 and the renewed state is state, eval calls the callback function cont as

(cont result state) ;; Where result == 42 and state is the renewed state


in the final step of the code. Here, two values result and state are “returned” to the callback function cont and are used later in cont.

Having the direct return type of eval to be a string makes the implementation of exit very simple. It is implemented in eval as:

;; LambdaCraft
nil)


Here is how it works:

• Normally, every successive chain of computations is evoked by calling the callback function (the continuation). Here, since eval no longer calls a continuation when exit is called, no further computation happens and the interpreter stops there.
• eval’s direct return value is set to nil, which is a string terminator. This leaves nil at the end of the output string, completing the value of the output string.

A similar implementation is used for read-expr, which exits the interpreter when there is no more text to read:

;; LambdaCraft
;; Exit the program when EOF is reached
(if (isnil stdin)
nil)


### Basic Data Structures

The state is a 3-tuple that contains reg, heap and stdin. In lambda terms:

\begin{aligned} {\rm state} &:= {\rm cons3} ~ {\rm reg} ~ {\rm heap} ~ {\rm stdin} \\ &= (\lambda a. \lambda b. \lambda c. \lambda f. (f~a~b~c)) ~ {\rm reg} ~ {\rm heap} ~ {\rm stdin} \\ &= \lambda f. ~ (f ~ {\rm reg} ~ {\rm heap} ~ {\rm stdin}) \end{aligned}

reg is used to store values of global variables used internally by the interpreter. heap is a virtual heap memory used to store the let and lambda bindings caused by the code. stdin is the pointer to the standard input provided by the interpreter.

Similar with cons, state is a function that accepts a callback function and applies it with the values it’s storing. Therefore, the contents of state can be extracted by passing a callback function to state:

;; LambdaCraft
(state
(lambda (reg heap stdin)
[do something with reg/heap/stdin]))


which is continuation-passing style code, since we are using callback functions that accept values and describes what to do with those values.

expr is a Lisp expression. Expressions in LambdaLisp belong to one of the following 5 types: atom, cons, lambda, integer, and string. All expressions are a 2-tuple with a type tag and its value:

${\rm expr} = {\rm cons} ~ {\rm typetag} ~ {\rm value} = \lambda f. ~ (f ~ {\rm typetag} ~ {\rm value})$

The structure of ${\rm value}$ depends on the type. For all types, ${\rm typetag}$ is a selector for a 5-tuple:

\begin{aligned} {\rm type\_ atom} &= \lambda a.\lambda b.\lambda c.\lambda d.\lambda e. a \\ {\rm type\_ cons} &= \lambda a.\lambda b.\lambda c.\lambda d.\lambda e. b \\ {\rm type\_ lambda} &= \lambda a.\lambda b.\lambda c.\lambda d.\lambda e. c \\ {\rm type\_ integer} &= \lambda a.\lambda b.\lambda c.\lambda d.\lambda e. d \\ {\rm type\_ string} &= \lambda a.\lambda b.\lambda c.\lambda d.\lambda e. e \\ \end{aligned}

This way, we can do type matching by writing:

(typetag
[code for atom case]
[code for cons case]
[code for lambda case]
[code for integer case]
[code for string case])


Type matching ensures that the ${\rm value}$ for each type is always processed correctly according to its type. Since each tag is a selector of a 5-tuple, the tag will select the code that will be executed next. Since all codes are lazily evaluated, the codes for the unselected cases will not be evaluated.

As in the case of state, the type and value can be extracted by passing a callback to expr:

;; LambdaCraft
(expr
(lambda (dtype value)
(dtype
[do something with value])))


### Virtual Heap Memory (RAM)

The virtual heap memory (RAM) is expressed as a binary tree constructed by the tuple constructor cons. The idea of using a binary tree data structure for expressing a RAM unit is borrowed from irori’s Unlambda VM (in Japanese). The implementation of the binary tree is modified from this definition so that the tree could be initialized as nil.

One RAM cell can store a value of any size and any type - Lisp terms, strings, or even the interpreter’s current continuation. This is because the RAM can actually only store one type, lambdas, but everything in lambda calculus belongs to that one type.

Trees are constructed using the same constructor ${\rm cons} = \lambda x. \lambda y. \lambda f. (f x y)$ as lists. A list $L$ containing A B C can be written using ${\rm cons}$ as:

$L = ({\rm cons} ~ A ~ ({\rm cons} ~ B ~ (\rm cons ~ C ~ {\rm nil})))$

Using the same mechanism, the following binary tree $T$,

can be expressed using ${\rm cons}$ cells as follows:

$T = ({\rm cons} ~ ({\rm cons} ~ A ~ B) ~ ({\rm cons} ~ C ~ D))$

Every node where all of its leaves have unwritten values have their branch trimmed and set to ${\rm nil}$. If all values with the address $1*$ is unwritten, the tree would look like this:

When the tree search function encounters ${\rm nil}$, it returns the integer zero (a list of $N$ consecutive $0$ s). The tree grows past ${\rm nil}$ only when a value is written past that address. The initial state of the RAM is ${\rm root} = {\rm nil}$, which effectively initializes the entire $N$-bit memory space to zero without creating $2^N$ nodes. Afterwards, the RAM unit only grows approximately linearly as values are written instead of growing exponentially with the machine’s bit size.

LambdaLisp uses a 32-bit address space for the RAM, which is specified here. The address space can be modified to an arbitrary integer by replacing the literal 32 which shows up only once in the source code with another Church-encoded numeral.

LambdaLisp exposes malloc and memory reading/writing to the user through the special forms malloc, memread and memwrite. malloc returns an integer indicating the pointer to the heap, which is initialized with nil. The pointer can be used with memread and memwrite to read and store LambdaLisp objects inside the interpreter’s heap cell. This can be used to implement C-style arrays, as demonstrated in malloc.lisp.

### Registers (Global Variables)

The register object reg uses the same binary tree data structure as the RAM, except reg uses variable-length addresses, while heap uses a fixed-length address. The variable-length address makes the address of each cell shorter, speeding up the read and write processes. reg is used to store global variables that are frequently read out and written by the interpreter.

For example, we can let the register tree ${\rm reg}$ have the following structure:

The addresses for $A, B, C, D$ becomes $0$, $10$, $110$, and $111$, respectively.

LambdaLisp uses 7 registers which are defined here.

### The Prelude Generator and String Compression

As mentioned before, the prelude is hard-coded as text that is embedded into the LambdaLisp source. When embedding it as a lambda, the text is compressed into an efficient lambda term, optimized for the binary lambda calculus notation.

The prelude is generated by consecutively applying lots and lots of characters to a function called string-concatenator:

;; LambdaCraft
(def-lazy prelude
(let
(("a" ...)
("b" ...)
("c" ...)
...)
(string-concatenator ... "n" "u" "f" "e" "d" "(" nil)))

(defrec-lazy string-concatenator (curstr x)
(cond
((isnil x)
curstr)
(t
(string-concatenator (cons x curstr)))))


The string concatenator is something close to a generator object in Python that:

• When called with a character, it pushes the character to the current stack curstr, and returns string-concatenator itself (with a renewed curstr).
• When called with nil, it returns the stocked curstr instead of returning itself.

To obtain the string (aaa), you can use string-concatenator as:

(string-concatenator ")" "a" "a" "a" "(" nil)


The characters are written in reverse order, since string-concatenator uses a stack to create strings.

This helps a lot for compressing strings in binary lambda calculus notation. The let in the above code is a macro that expands to:

;; LambdaCraft
(def-lazy prelude
((lambda ("(")
((lambda (")")
((lambda ("a")
(string-concatenator ")" "a" "a" "a" "(" nil))
[definition of "a"]))
[definition of ")"]))
[definition of "("]))


In binary lambda calculus, the innermost expression encodes to

(string-concatenator ")" "a" "a" "a" "(" nil)
= apply apply apply apply apply string-concatenator ")" "a" "a" "a" "("
= 01 01 01 01 01 [string-concatenator] 110 10 10 10 1110


Notice that the letter “a” is encoded as 01 ... 10, which effectively only takes 4 bits. Similarly, “)” takes 5 and “)” takes 6 bits. Since apply doesn’t increase the De Bruijn indices no matter how many times it appears, every occurrence of the same character can be encoded into the same number of bits. Therefore, by encoding each character in the order of of appearance, its BLC notation can be optimized to a short bit notation.

This part of the string generator can be seen as an interesting pattern in page 33 in lambdalisp.pdf: Here you can see lots of variables being applied to some lambda term shown on the top line, which actually represents string-concatenator.

This consecutive application causes a lot of consecutive (s. This makes page 32 entirely consist of (: ## Language Features

### The Memory Model for Persistent Bindings

let bindings are stored in the heap in the state and passed on as persistent bindings. Each let binding holds their own environment stack, and each environment stack points to its lexical parent environment stack. For example, the following LambdaLisp code:

;; LambdaLisp
(let ((x 10) (y 20))
(setq x 5)
(let ((f (lambda ...)) (a '(a b c)))
(print x)
...)
(let ((g (lambda ...)) (b '(a b c))
(print b)
...)))


induces the following memory tree: The node containing name = hello is not relevant now. The root node at the bottom contains bindings to basic functions initialized when running the prelude. This memory tree is expressed in the heap as: Here, the virtual RAM address space is shown as 16 bits for simplicity (the actual address space is 32 bits).

When a let binding occurs, the newest unallocated heap address is allocated my the malloc function, and the interpreter’s “current environment pointer” global variable contained in reg is rewritten to the newly allocated address. This also happens when a lambda is called, creating an environment binding the lambda arguments.

Each stack is initialized with a redirection pointer that points to its parent environment, shown at the bottom of the stack in the second figure. The bindings for each variable name is then pushed on top of the stack. On variable lookup, the lookup function first looks at the current environment’s topmost binding. If the target variable is not contained in the stack, the lookup reaches the redirection pointer at the bottom of the stack, where it will run the lookup again for the environment pointed by the redirection pointer. The lookup process ends when the redirection pointer is nil, where it concludes that the target variable is not bound to any value in the target environment. The use of the redirection pointer effectively models the environments’ tree structure.

For example, the x in the (print x) in the example code will invoke a lookup for x. The lookup function first looks at the binding stack at address 0x0002 and searches the stack until it reaches the redirection pointer to 0x0001. The lookup function then searches the binding stack at 0x0001, where it finds the newest binding of x which is x == 5. Since new bindings are pushed onto the stack, they get found before old bindings (here, x = 10) are reached.

Variable rewriting with setq is done by pushing assignments onto the stack. The (setq x 5) first searches for the variable x, finds it at environment 0x0001, then pushes x = 5 on top of y = 20 :: x = 10 on the environment 0x0001.

The address 0x0000 represents the base environment. It first starts out by being initialized by the function and macro definitions in the prelude. Variables stored in the base environment behave as global variables. This is where setq and defglobal behave differently. When (setq x ...) is called, setq first searches for x from the environment tree. If it finds x somewhere in the tree, it rewrites the found x at that environment. If it doesn’t find x, it defaults to pushing a new variable x to the current surrounding environment. This way, setq will only affect known variables or local variables. On the other hand, defglobal pushes the bindings to address 0x0000 no matter where it is called. This way, the address 0x0000 can behave as a global environment. The macro defun is defined using defglobal so that it always writes to the global environment. The macro defun-local is defined using setq so that it writes in the local environment, allowing for Python and JavaScript-style local function definitions:

;; LambdaLisp
(defun f (x y)
(defun-local helper (z)
(* z z))
(print helper) ;; => @lambda
(+ (helper x) (helper y)))

(print helper) ;; => (), since it is invisible from the global environment


### Garbage Collection

Although there is no garbage collection for let and setq bindings, there is a minimal form of GC for macro expansion. During an evaluation of a macro, there occurs 2 Lisp form evaluations: one for constructing the expanded form, and another for evaluating the expanded form in the called environment. Once the macro has been expanded, we know for sure that the bindings it has used will not be used again. Therefore, macro bindings are allocated to the stack region which has negative addresses starting from 0xFFFF, shown on the left half of the memory tree diagram and the heap diagram. The stack region is freed once the macro expansion finishes. This mechanism also supports nested macros.

In the memory tree in the first figure in the previous section, the bindings for macros look the same as the bindings for lambdas, since at evaluation time they are treated the same. In the second figure, the environments for the macro bindings are shown on the left side of the RAM, since they are allocated in the stack region.

Although the same garbage-collection feature could be implemented for lambdas, it causes problems for lambdas that return closures. If lambda bindings are created in the stack region, the environment of the returned closure will point to the stack region, which will be freed right after the closure is returned. This causes a problem when the returned closure refers to a variable defined in the global environment (for example, to basic macros defined in the prelude such as cond), since the lookup function will eventually start looking at the stack region, which could be occupied by an old environment or some other lambda’s environment, causing the risk for bugs. This could be circumvented by carefully writing the code to not shadow global variables in let bindings, but that would severely restrict the coding style, so I chose to allocate new bindings for each lambda call. The growing memory can be freed by implementing a mark-and-sweep garbage collection function, which is currently not supported by LambdaLisp.

### Macros

Macros are implemented as first-class objects in LambdaLisp. Both macros and lambdas are subtypes of the type lambda, each annotated with a subtype tag.

Macros are treated the same as lambdas, except for the following differences:

• The arguments are taken verbatim and are not evaluated before being bound to the argument variable names.
• In lambdas, the arguments are evaluated before being bound.
• Two evaluations happen during a macro evaluation. The first evaluation evaluates the macro body to build the macro-expanded expression. The expanded expression is then evaluated in the environment which the macro was called.
• In lambdas, only one evaluation (the latter one) happens.
• The first evaluation always has the base environment 0x00000000 set as its parent environment so that expansion always happens in the global environment, as explained in the previous section.
• The second evaluation works the same as lambdas, where the environment where the macro/lambda was called is used for evaluation.
• The environment for the bound variables are stored in the stack region (negative addresses starting from 0xFFFFFFFF).
• In lambdas, they are stored in the heap region (starting from 0x00000001).

An anonymous macro can be made with the macro keyword as (macro (...) ...), in the exact same syntax as lambdas. In the prelude, defmacro is defined as the following macro:

;; LambdaLisp
(defglobal defmacro (macro (name e &rest b)
(defglobal ,name (macro ,e (block ,name ,@b)))))


### Object-Oriented Programming

In Let Over Lambda, it is mentioned that object-oriented programming can be implemented by using closures (Chapter 2). A primitive example is the counter example we’ve seen at the beginning:

(defun new-counter (init)
;; Return a closure.
;; Use the let over lambda technique for creating independent and persistent variables.
(let ((i init))
(lambda () (setq i (+ 1 i)))))

;; Instantiate counters
(setq counter1 (new-counter 0))
(setq counter2 (new-counter 10))

(print (counter1)) ;; => 1
(print (counter1)) ;; => 2
(print (counter2)) ;; => 11
(print (counter1)) ;; => 3
(print (counter2)) ;; => 12
(print (counter1)) ;; => 4
(print (counter1)) ;; => 5


LambdaLisp extends this concept and implements OOP as a predefined macro in the prelude. LambdaLisp supports the following Python-like object system with class inheritance:

(defclass Counter ()
(i 0)

(defmethod inc ()
(setf (. self i) (+ 1 (. self i))))

(defmethod dec ()
(setf (. self i) (- (. self i) 1))))

(defmethod *init (i)
(setf (. self i) i))

(setf (. self i) (+ (. self i) n))))

(defmethod *init (c)
((. (. self super) *init) c))

(defmethod sub (n)
(setf (. self i) (- (. self i) n))))

(defparameter counter1 (new Counter))

((. counter1 inc))
((. counter3 sub) 10000)

(setf (. counter1 i) 5)
(setf (. counter2 i) 500)
(setf (. counter3 i) 50000)


### Blocks

The notion of blocks in LambdaLisp is a feature borrowed from Common Lisp, close to for and break in C and Java. A block creates a code block that can be escaped by running (return [value]) or (return-from [name] [value]).

For example:

;; LambdaLisp
(block block-a
(if some-condition
(return-from block-a some-value))
...
some-other-value)


Here, when some-condition is true, the return-from lets the control immediately break from block-a, setting the value of the block to some-value. If some-condition is false, the program proceeds until the end, and the value of the block becomes some-other-value, which is the same behavior as progn. Nested blocks are also possible, as shown in examples/block.cl.

defun is defined to wrap its content with an implicit block, you can write return-from statements with the function name:

;; LambdaLisp
(defun f (x)
(if some-condition
(return-from f some-value))
(if some-condition2
(return-from f some-value2)
...))


Here is the definition of defun in the prelude:

;; LambdaLisp
(defmacro defun (name e &rest b)
(defglobal ,name (lambda ,e (block ,name ,@b))))


In order to implement blocks, the interpreter keeps track of the name and the returning point of each block. This is done by preparing a global variable reg-block-cont in the register, used as a stack to push and pop name-returning-point pairs. Since LambdaCraft is written in continuation-passing style, the returning point is explicitly given as the callback function cont at any time in the eval function. Using this feature, when a block form appears, the interpreter first pushes the name and the current cont to the reg-block-cont global variable. The pushed cont is a continuation that expects the return value of the block to be applied as its argument. Whenever a (return-from [name] [value]) form is called, the interpreter searches the reg-block-cont stack for the specified [name]. Since the searched cont expects the return value of the block to be applied, the block escape control flow can be realized by applying [value] to cont, after popping the reg-block-cont stack.

### Loops

A loop is a special form equivalent to while (true) in languages such as C, used to create infinite loops. A loop creates an implicit block with the name (), and can be exited by running (return) inside:

;; LambdaLisp
(defparameter i 0)
(loop
(if (= i 10)
(return))
(print i)
(setq i (+ i 1)))


Loops can also be exited by surrounding it with a block:

;; LambdaLisp
(defparameter i 0)
(block loop-block
(loop
(if (= i 10)
(return-from loop-block))
(print i)
(setq i (+ i 1))))


Loops are implemented by passing a self-referenced continuation that runs the contents of loop again.

### Error Invocation and Stack Traces

Some examples of situations when errors are invoked in LambdaLisp are:

• In read-expr:
• When an unexpected ) is seen
• In eval:
• When an unbound variable is being referenced
• In eval-apply (used when lambdas or macros are called):
• When the value in the function cell does not belong to a subtype of lambda

When an error occurs during eval, read-expr or any function, the interpreter does the following:

• It immediately stops what it’s doing
• It shows an error message
• It shows the function call stack trace
• It returns to the REPL, awaiting the user’s input

Immediately stopping the current task and returning to the REPL is implemented very simply thanks to continuation-passing style. When an error is invoked, instead of calling the continuation (the callback), the repl function is called - this simple implementation allows error invocation.

Since invoking an error calls repl, and repl calls read-expr, eval and eval-apply, this makes the four functions read-expr, eval, eval-apply and repl be mutually recursive functions. How mutually recursive functions are implemented in LambdaLisp is described later in the next section.

The function call stack trace is printed by managing a call stack in one of the interpreter’s global variables. Every time a lambda or a macro is called, the interpreter pushes the expression that invoked the function call to the call stack. When the function call exits properly, the call stack is popped. When an error is invoked during a function call, the interpreter prints the contents of the call stack.

## Other General Lambda Calculus Programming Techniques

Below are some other general lambda calclus programming techniques used in LambdaLisp.

### Mutual Recursion

In LambdaCraft, recursive functions can be defined using defrec-lazy as follows:

;; LambdaCraft
(defrec-lazy fact (n)
(if (<= n 0)
1
(* n (fact (- n 1)))))


defrec-lazy uses the Y combinator to implement anonymous recursion, a technique used to write self-referencing functions under the absence of a named function feature. Since LambdaLisp is based on macro expansion, when a self-referencing function is written using defun-lazy, the function body becomes infinitely expanded, causing the program to not compile. LambdaCraft shows an error message in this case. Using the Y combinator through defrec-lazy prevents this infinite expansion from happening.

Things get more complex in the case of what is called mutual recursion. In LambdaLisp, the functions read-expr, eval, eval-apply, and repl are mutually recursive, meaning that these functions call each other inside their definitions. Although using the normal Y combinator would still make the code compilable in this case, it makes the same function be inlined over and over again, severely expanding the total output lambda term size.

The redundant inlining problem can be solved if each function held a reference to each other. This can be done by implementing a multiple-function version of the Y combinator. The derivation of a fixed point combinator for mutual recursion is described very intuitively in Wessel Bruinsma’s blog post, A Short Note on The Y Combinator, in the section “Deriving the Y Combinator”. Below is a summary of the derivation process introduced in this post.

Suppose that we have two functions $f$ and $g$ that are mutually recursive. Since they reference each other in their definitions, $f$ and $g$ can be defined in terms of some function $h_f$ and $h_g$, which takes the definitions of $f$ and $g$ as its arguments:

\begin{aligned} f &:= h_f f g & (1)\\ g &:= h_g f g & (2)\\ \end{aligned}

$h_f$ and $h_g$ looks something like:

\begin{aligned} h_f &= \lambda f'. \lambda g'. \lambda x. k_f [f',g',x] \\ h_g &= \lambda f'. \lambda g'. \lambda x. k_g [f',g',x] \\ \end{aligned}

Where $k_* [f',g',x]$ is a term containing $f', g'$ and $x$.

Now suppose that $f$ and $g$ could be written using some unknown function $\hat f$ and $\hat g$ as:

\begin{aligned} f &:= \hat f \hat f \hat g \\ g &:= \hat g \hat f \hat g \end{aligned}

Plugging this into Equations (1) and (2),

\begin{aligned} \hat f \hat f \hat g &= h_f (\hat f \hat f \hat g) (\hat g \hat f \hat g ) \\ \hat g \hat f \hat g &= h_g (\hat f \hat f \hat g) (\hat g \hat f \hat g ) \\ \end{aligned}

Which can be abstracted as

\begin{aligned} \hat f \hat f \hat g &= (\lambda x. \lambda y. h_f (x x y) (y x y)) \hat f \hat g \\ \hat g \hat f \hat g &= (\lambda x. \lambda y. h_g (x x y) (y x y)) \hat f \hat g \\ \end{aligned}

Comparing both sides, we have

\begin{aligned} \hat f &= \lambda x. \lambda y. h_f (x x y) (y x y) \\ \hat g &= \lambda x. \lambda y. h_g (x x y) (y x y) \\ \end{aligned}

which are closed-form lambda expressions. Plugging this into our definitions $f := \hat f \hat f \hat g$ and $g := \hat g \hat f \hat g$, we get the mutually recursive definitions of $f$ and $g$.

A 4-function version of this mutual recursion setup is used in the init function in LambdaLisp to define repl, eval-apply, read-expr, and eval:

;; LambdaCraft
(let* read-expr-hat  (lambda (x y z w) (def-read-expr  (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* eval-hat       (lambda (x y z w) (def-eval       (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* eval-apply-hat (lambda (x y z w) (def-eval-apply (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* repl-hat       (lambda (x y z w) (def-repl       (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* repl       (repl-hat       read-expr-hat eval-hat eval-apply-hat repl-hat))
(let* eval-apply (eval-apply-hat read-expr-hat eval-hat eval-apply-hat repl-hat))
(let* eval       (eval-hat       read-expr-hat eval-hat eval-apply-hat repl-hat))


The functions *-hat and def-* each correspond to $\hat *$ and $h_*$ in the derivation. The final functions repl eval-apply read-expr eval are defined in terms of these auxiliary functions.

The functions def-* are defined in a separate location. def-eval, used to define eval, is defined as follows:

;; LambdaCraft
(defun-lazy def-eval (read-expr eval eval-apply repl expr state cont)
...)


def-eval is defined as a non-recursive function using defun-lazy, taking the four mutually dependent functions read-expr eval eval-apply repl as the first four arguments, followed by its “actual” arguments expr state cont. The first four arguments are occupied in the expression (def-read-expr (x x y z w) (y x y z w) (z x y z w) (w x y z w)) in the definition of read-expr-hat. By currying, this lets eval only take the 3 remaining unbound arguments, expr state cont.

### Deriving ‘isnil’

isnil is one of the most important functions in cons-based lambda calculus programming, bearing the importance comparable to the atom special form in McCarthy’s original pure Lisp.

Consider the following function reverse that reverses a cons-based list:

;; LambdaCraft
(defrec-lazy reverse (l tail)
(do
(if-then-return (isnil l)
tail)
(<- (car-l cdr-l) (l))
(reverse cdr-l (cons car-l tail))))


reverse is a recursive function that reverses a nil-terminated list made of cons cells. The base case used to end the recursion for reverse is when l is nil, decided by (isnil l).

Basically, any recursive function that takes a cons-based list must check if the incoming list is either a cons or a nil to write its base case. However, these data types have very different definitions:

\begin{aligned} {\rm cons} ~ A ~ B &:= \lambda f. (f ~ A ~ B) \\ {\rm nil} &:= \lambda x. \lambda y. y \end{aligned}

The function isnil must return ${\rm true} := \lambda x. \lambda y. x$ when its argument is ${\rm nil}$, and return ${\rm nil}$ when the argument is any ${\rm cons}$ cell. At first sight, it seems that this is impossible since there is no general way to check the equivalence of two given general lambda terms. Moreover, cons cells are a class of lambda terms that have the form $\lambda f. (f ~ A ~ B)$.

While checking the equivalence of general lambda terms is impossible due to the halting problem (c.f. here), it is possible for some cases of comparing between lambdas with predefined function signatures. This is also the case for isnil, where we can derive its concrete definition as follows.

First observe that a ${\rm cons}$ cell is a function that takes one callback function and applies to it the two values that it holds. On the other hand, ${\rm nil}$ is a function that takes two functions as its argument. This difference makes the following sequence of applications to turn out differently for ${\rm cons}$ and ${\rm nil}$:

\begin{aligned} ({\rm cons} ~ A ~ B) ~ (\lambda a. \lambda b. x) ~ c &= (\lambda f. (f ~ A ~ B)) ~ (\lambda a. \lambda b. x) ~ c \\ &= x ~ c \\ {\rm nil} ~ (\lambda a. \lambda b. x) ~ c &= (\lambda x. \lambda y. y) ~ (\lambda a. \lambda b. x) ~ c \\ &= c \\ \end{aligned}

Here, we applied $(\lambda a. \lambda b. x)$ and $c$ to $({\rm cons} ~ A ~ B)$ and ${\rm nil}$, where $x$ and $c$ are free variables. Notice that the abstractions in $(\lambda a. \lambda b. x)$ are used to ignore the values contained in the ${\rm cons}$ cell.

These sequences of applications can directly be used to define ${\rm isnil}$. To implement ${\rm isnil}$, we want the latter expression to evaluate to ${\rm true}$. This can be done by setting $c := {\rm true}$.

It then remains to find an $x$ where $(x ~ c) = (x ~ {\rm true}) := {\rm nil}$. From here we immediately get $x := \lambda w. {\rm nil}$.

Therefore, ${\rm isnil}$ can be written by abstracting this process:

\begin{aligned} {\rm isnil} &= \lambda z. (z ~ (\lambda a. \lambda b. \lambda w. {\rm nil}) ~ {\rm true}) \\ &= \lambda z. (z ~ (\lambda a. \lambda b. \lambda w. \lambda x. \lambda y. y) ~ (\lambda x. \lambda y. x)) \\ \end{aligned}

which can be used as

\begin{aligned} {\rm isnil} ~ ({\rm cons} ~ A ~ B) &= {\rm nil} \\ {\rm isnil} ~ {\rm nil} &= {\rm true} \end{aligned}

The pattern $\lambda a. \lambda b. \lambda w. \lambda x. \lambda y. y$ is noticable in many places in lambdalisp.pdf, since isnil is used a lot of times in LambdaLisp.

### LambdaCraft’s ‘do’ Macro

The heavy use of CPS in LambdaLisp is supported by LambdaCraft’s do macro. Writing raw code in CPS causes a heavily nested code, since CPS is based on callbacks. The do macro makes such nested code to be written as flat code, in a syntax close to Haskell’s do notation for monads.

Here is the do notation in Haskell:

do
(y, z) <- f x
w <- g y z
return w


A similar code can be written in LambdaCraft as:

(do
(<- (y z) (f x))
(<- (w) (g y z))
w)


which is macro-expanded to:

(f x
(lambda (y z)
(g y z
(lambda (w)
w))))


where f, g is defined in CPS:

(defun-lazy f (x cont) ...)
(defun-lazy g (y z cont) ...)


Here, the cont represents the callback function, which is directly written as the inner lambdas.

The return type of eval is string, and not expr. This is because expr is “returned” with CPS, by applying it to the provided callback:

;; LambdaCraft
(eval expr state
(lambda (eval-result state)
[do something with eval-result and state]))


This is written using do as:

(do
(<- (eval-result state) (eval expr state)
[do something with eval-result and state]))


The nice part about the do macro and having eval return strings is that it makes print debugging very intuitive. Since LambdaLisp is written in CPS, an arbitrary point in the eval function eventually becomes the head of the current evaluation. Therefore, at any point in the program, you can write

;; LambdaCraft
(eval expr state
(lambda (eval-result state)
(cons "a" (cons "b" (cons "c" [do something with eval-result and state])))))


which will make the entire expression eventually evaluate to

;; LambdaCraft
;; The outermost expression
(cons "a" (cons "b" (cons "c" ...)))


which will print “abc” in the console.

The previous code can be written in imperative style using do as

;; LambdaCraft
(do
(<- (eval-result state) (expr state))
(cons "a")
(cons "b")
(cons "c")
[do something with eval-result and state])


which is virtually identical to writing (print "a") in an imperative language. Note that the default behavior of do is to nest the successive argument at the end of the list, starting from the last argument, and <- is a specially handled case by do.

(cons "a") can be replaced with a string printing function that accepts a continuation, such as:

;; LambdaCraft
(defun-lazy show-message (message state cont)
(do
(print-string message)
(cons "\\n")
(cont state)))


which will print message, print a newline, and proceed with (cont state).

This design was very helpful during debugging, since it let you track the execution flow using print debugging. This design and technique can be used in other general lambda-calculus based programs as well.

### Type Checking with Macro Call Signatures

Another large-scale programming technique is using macro signatures as a type-checking functionality.

Since all lambdas defined by def-lazy, defun-lazy or defrec-lazy are curried in LambdaCraft, there is no simple way to tell how many arguments a lambda that is being called takes. This is different for LambdaCraft macros defined by defmacro-lazy, since LambdaCraft macros are implemented as Common Lisp functions that run on the Common Lisp environment to expand the macro at compile time. Therefore, when a LambdaCraft macro is called with excessive or insufficient number of arguments, it causes a Common Lisp error at compile time. This works as a strong macro signature type-checker which significantly helps the debugging process of LambdaCraft, letting you know when a LambdaCraft macro is called with insufficient arguments. This effectively works as a simple type checker for macro call signatures. The Common Lisp call stack even tells you which line got the macro call wrong which helps debugging as well. It is therefore convenient to use as much macros as possible when writing programs in LambdaCraft.

Writing in terms of macros help reduce the code size as well, since using macros can be seen as running compile-time beta reductions beforehand of the runtime. For example, while cons can be written as a function (defun-lazy cons (x y) (lambda (f) (f x y))), it can also be written as a macro (defmacro-lazy cons (x y) (lambda (f) ,x ,y)), which is the beta-reduced form of the function-based definition. Either code will evaluate the same results by writing (cons p q), except the function-based one requires an extra beta-reduction step at runtime, affecting the performance.

## Appendix

### JavaScript Examples of Continuation-Passing Style Code

CPS causes the largest impact when extracting the values of a cons cell. This is illustrated in the JavaScript code below that runs on a browser’s JavaScript console.

In direct style, destructing a value of a cons tuple is written as:

// Runs on the browser's JavaScript console
function car (x) {
return x(function (a, b) { return a; });
}
function cdr (x) {
return x(function (a, b) { return b; });
}

function cons (a, b) {
return function (f) {
return f(a,b);
};
}
function t (a, b) {
return a;
}
function nil (a, b) {
return b;
}

(function(x) {
return (function (y) {
return (function (z) {
return z(z, z);
})(car(y));
})(cdr(x));
})(cons(t, cons(nil, t)))


The last expression in this code first binds (cons t (cons nil t)) to x, and calculates (car (cdr x)). Running this on the browser’s console should return the function nil.

The car and cdr correspond to the following lambda terms that accept a data and return the data in the desired position:

\begin{aligned} {\rm car} &= \lambda x. \lambda y. x \\ {\rm cdr} &= \lambda x. \lambda y. y \\ \end{aligned}

On the other hand, in continuation-passing style, the same code is written as:

// Runs on the browser's JavaScript console
function cons (a, b) {
return function (f) {
return f(a,b);
};
}
function t (a, b) {
return a;
}
function nil (a, b) {
return b;
}

cons(t, cons(nil, t))(            // cons returns a function that accepts a callback function. We pass a callback to it
function (tmp1, y){           // This function works as cdr; It takes the second value (and discards the first)
return y(                 // y == cons(nil, t) now. Inside, we write what we want to do with the y we receive via the callback. Here we pass another callback to the return value of the inner cons.
function (z, tmp2) {  // This function works as car; It takes the first value (and discards the second)
return z(z, z);   // z == nil now. nil selects the second value among its arguments, which here evaluates to nil.
}
);
}
)


Here, values are extracted without using car or cdr at all. It instead uses the fact that a cons cell is actually a function that accepts a callback function that applies both of its contents to the provided callback.

This significantly improves performance when reading the stdin, which is a list made from cons cells:

;; LambdaCraft
(stdin
(lambda (c cdr-stdin)
(if (=-bit c "(")
...)))


### The Binary Lambda Calculus Notation

The interpreters Blc, tromp, and uni runs programs writen in binary lambda calculus (BLC - also see here). The difference between binary lambda calculus and ordinary lambda calculus is in the notation, i.e. how lambda terms are written, and everything else including the rules of beta reduction are the same. BLC’s notation is based on De Bruijn notation, where variable names such as $x, y$ and $z$ are eliminated, and are instead replaced by an integer describing the relative nesting depth distances viewed from the variable’s surrounding scope.

The De Bruijn notation works as follows. Consider for example the term $\lambda x. \lambda y. \lambda z. y$. Here, from the eyes of the term $y$, the originating lambda of the variable $y$ is reached by hopping 1 abstraction $\lambda z$, so the De Bruijn index for the term $y$ is 1. We can therefore rewrite as

$\lambda x. \lambda y. \lambda z. y = \lambda x. \lambda y. \lambda z. 1$

and still recover its meaning. Similarly, The index for $x$ in $\lambda x. \lambda y. \lambda z. x$ would be 2, and for $z$ in $\lambda x. \lambda y. \lambda z. z$ would be 0 since no hops are required. This works for a more complicated setting, for example in $\lambda x. (x \lambda y. \lambda z.((x z) y))$ the index for $y$ is 1. $x$ occurs twice in this term, and each are encoded differently in this case. The innermost $x$ has index 2 since 2 hops is required past $\lambda z$ and $\lambda y$, but the outer $x$’s index is 0 since no hops are required to reach $\lambda x$ in the eyes of the outermost $x$. We can then write

$\lambda x. (x \lambda y. \lambda z.((x z) y)) = \lambda x. (0 \lambda y. \lambda z.((2 0) 1))$

and we can still deduce which variable each integer corresponds to.

We then notice that when written in De Bruijn indices, the variable names in the lambda declaration becomes entirely redundant. The expression can thus be rewritten as

\begin{aligned} \lambda x. (x \lambda y. \lambda z.((x z) y)) &= \lambda x. (0 \lambda y. \lambda z.((2 0) 1)) \\ &= \lambda (0 \lambda \lambda ((2 0) 1)) \end{aligned}

and it would still hold the same meaning.

We can simplify the notation more by writing function application $(A B)$ as ${\rm apply}~A~B$. Doing this we get:

\begin{aligned} \lambda x. (x \lambda y. \lambda z.((x z) y)) &= \lambda (0 \lambda \lambda ((2 0) 1)) \\ &= \lambda ~ {\rm apply} ~ 0 ~ \lambda \lambda ~{\rm apply} ~ {\rm apply} ~ 2 ~ 0 ~ 1 \end{aligned}

By assuming that ${\rm apply}$ always takes exactly 2 parameters, we can eliminate the need for writing parentheses to express ${\rm apply}$.

Binary lambda calculus then encodes this sequence as follows:

\begin{aligned} \lambda &= 00 \\ {\rm apply} &= 01 \\ i &= 1^{i+1}0 \\ \end{aligned}

We thus have

\begin{aligned} \lambda x. (x \lambda y. \lambda z.((x z) y)) &= \lambda ~ {\rm apply} ~ 0 ~ \lambda \lambda ~{\rm apply} ~ {\rm apply} ~ 2 ~ 0 ~ 1 \\ &= 00~01~10~00~00~01~01~1110~10~110 \end{aligned}

which completes the definition of the binary lambda calculus notation.

The Blc interpreter (and the Universal Lambda interpreter) accepts this bitstream and parses it to a lambda term, which then applies beta reduction to execute the program.

Some more examples are

\begin{aligned} \lambda x. x &= \lambda 0 &= 00~10 \\ \lambda x. \lambda y. x &= \lambda \lambda 1 &= 00~00~110 \\ (\lambda x. x) (\lambda x. x) &= {\rm apply}~ \lambda 0 ~ \lambda 0 &= 01~00~10~00~10 \end{aligned}

The elegance of this notation is that it is a prefix code, so no delimiting characters are required - the spaces between the $0$ and $1$ can be removed in practice. Moreover, for any valid proram $P$, there exists no valid program $PQ$ that starts with $P$ followed with a nonempty $Q$.

Due to the prefix code property, the interpreter can distinguish the boundary between the program and the stdin even if they are provided as concatenated byte streams. For example, when running lambdalisp.blc, we run:

cat lambdalisp.blc | ./asc2bin > lambdalisp.blc.bin
cat lambdalisp.blc.bin [filepath] | ./Blc


Normally, it is difficult or impossible to deduce the boundary between the binary lambdalisp.blc.bin and [filepath] since they are cated together, but it is easily possible in BLC since all valid programs are prefix codes. • The rounded edges of the cube represent Lisp’s parentheses syntax.
• Alan Kay described Lisp in the quote, “Lisp isn’t a language, it’s a building material.” The cube is a depiction of a block, which represents the building blocks of software.

## Credits

LambdaLisp was written by Hikaru Ikuta. The lambda calculus term compiler LambdaCraft was written by Hikaru Ikuta, inspired by Ben Rudiak-Gould’s Scheme program Lazier, a compiler from lambda terms written in Scheme to Lazy K. The LambdaLisp logo was designed by Hikaru Ikuta. The 521-byte lambda calculus interpreter SectorLambda was written by Justine Tunney. The IOCCC 2012 “Most functional” interpreter was written by John Tromp. The Universal Lambda interpreter clamb and Lazy K interpreter lazyk was written by Kunihiko Sakamoto.

]]>
Building a Neural Network in Pure Lisp Without Built-In Numbers Using Only Atoms and Lists2022-01-16T19:00:35+09:002022-01-16T19:00:35+09:00https://woodrush.github.io/blog/posts/neural-networks-in-pure-lisp At the dawn of Lisp after its birth in 1958, Lisp was used as a language for creating advanced artificial intelligence. This project makes that a reality once again by implementing a neural network for pattern recognition written in pure lisp without built-in integers or floating-point numbers, that runs on the IBM PC model 5150.

## Building Neural Networks only using Symbolic Manipulation

SectorLISP is an amazing project where a fully functional Lisp interpreter is fit into the 512 bytes of the boot sector of a floppy disk. Since it works as a boot sector program, the binary can be written to a disk to be used as a boot drive, where the computer presents an interface for writing and evaluating Lisp programs, all running in the booting phase of bare metal on the 436-byte program. I have written another blog post on SectorLISP about extending SectorLISP to implement BASIC REPLs and games.

SectorLISP is implemented as a pure Lisp. In pure Lisp, there are no built-in types for integers or floating-point numbers, and only supports atoms and lists as available data structures. Surprisingly, even with the lack of numbers, such a Lisp is Turing-complete, meaning that it is basically capable of any calculation that can be done on modern computers.

In this project, we implement a neural network that runs on SectorLISP. Since there are no features of built-in numbers, we have to reinvent the notion of numbers from scratch only by using symbolic manipulation. We first start off by constructing a fixed-point number calculation system based solely on list manipulations, and finally, implement matrix multiplication and activation functions using this fixed-point number system.

Since SectorLISP runs on the IBM PC model 5150, this implementation allows neural networks to run on the booting phase of vintage PCs.

## Running the Neural Network on Your Computer

The source code for the SectorLISP neural network, as well as the training and testing scripts used to obtain the model parameters, are available at my GitHub repository:

https://github.com/woodrush/sectorlisp-nn

Here I will describe the instructions for running the SectorLISP program to calculate predictions for custom digit images in detail. The instructions for training and evaluating the neural network to obtain the model parameters used for this network is available at the repository.

The available emulators are QEMU and the i8086 emulator Blinkenlights. I will also describe how to run SectorLISP on physical hardware, except for this method you must type the entire Lisp program by hand into the computer. In the emulators, you can either automatically load the code or paste it into the console.

### Running on QEMU

If you have QEMU installed, running the Lisp neural network on QEMU can be done by running the following make prodedure:

git clone https://github.com/woodrush/sectorlisp
git checkout nn
git submodule update --init --recursive
cd test
make nn


This will start QEMU with SectorLISP loaded as the boot sector program, and will automatically type the Lisp program into the emulator’s console.

Due to the way the test script handles the text stream between the host PC and QEMU, it first takes 10 minutes to type the entire Lisp source code to the emulator’s console. After waiting for 10 minutes, the actual inference time only takes about 4 seconds, where the program will show a message on the screen indicating the predicted digit. The running time was measured using a 2.8 GHz Intel i7 CPU.

To input a custom 3x5 digit image, edit the following expression at the end of the program, ./sectorlisp-nn/nn.lisp, inside the sectorlisp-nn submodule:

 (QUOTE
;; input
)
(QUOTE (* * *
* . .
* * *
. . *
* * *))


Here are the instructions on running the network on the i8086 emulator Blinkenlights.

First, git clone the SectorLISP repository and make SectorLISP’s binary, sectorlisp.bin:

git clone https://github.com/jart/sectorlisp
cd sectorlisp
make


This will generate sectorlisp.bin under ./sectorlisp.

By building a fork of SectorLISP that supports I/O, an additional output with some messages indicating the input and the output will become printed. In this case, git checkout to the io branch by running git checkout io before running make. Since the source code for this project is backwards comptible with the main SectorLISP branch, the same code can be run on both versions.

Update (2022/4/6): The fork mentioned here was merged into the original SectorLISP repository. The features mentioned here can now be used without using the fork, and by using the original SectorLISP repository.

curl https://justine.lol/blinkenlights/blinkenlights-latest.com >blinkenlights.com


You can then run SectorLISP by running:

./blinkenlights.com -rt sectorlisp.bin


In some cases in Ubuntu, there might be a graphics-related error showing and the emulator may not start. In that case, run the following command first available on the download page:

sudo sh -c "echo ':APE:M::MZqFpD::/bin/sh:' >/proc/sys/fs/binfmt_misc/register"


After starting Blinkenlights, expand the size of your terminal large enough so that the TELETYPEWRITER region shows up at the center of the screen. This region is the console used for input and output. Then, press c to run the emulator in continuous mode. The cursor in the TELETYPEWRITER region should move one line down. You can then start typing in text or paste a long code from your terminal into Blinkenlight’s console to run your Lisp program.

To run the neural network program, copy the contents of nn.lisp from the repository to your clipboard, and paste it inside the terminal into Blinkenlight’s console. After waiting for about 2 minutes, the result will be shown on the console. Note that it is important to copy the newline at the end of the program, which will trigger the turbo mode on Blinkenlights which makes it run significantly faster. In this case, the screen will initially show nothing after you paste the code, but you can confirm that it is running by checking the CPU usage of your computer. If the code shows up right away after pasting with the cursor right next to the final parentheses of the code, you may have not included the newline, which takes significantly more time since it does not run in turbo mode.

On Blinkenlights, it took 2 minutes from pasting the code to obtaining the final inference results. The running time was measured using a 2.8 GHz Intel i7 CPU.

To input a custom 3x5 digit image, edit the following expression at the end of the program:

 (QUOTE
;; input
)
(QUOTE (* * *
* . .
* * *
. . *
* * *))


### Running on Physical Hardware

You can also run SectorLISP on an actual physical machine if you have a PC with an Intel CPU that boots with a BIOS, and a drive such as a USB drive or a floppy disk that can be used as a boot drive. Note that when running the neural network program this way, you must type the entire program by hand into the console.

First, mount your drive to the PC you’ve built sectorlisp.bin on, and check:

lsblk -o KNAME,TYPE,SIZE,MODEL


Among the list of the hardware, check for the device name for your drive you want to write SectorLISP onto. After making sure of the device name, run the following command, replacing [devicename] with your device name. [devicename] should be values such as sda or sdb, depending on your setup.

Caution: The following command used for writing to the drive will overwrite anything that exists in the target drive’s boot sector, so it’s important to make sure which drive you’re writing into. If the command or the device name is wrong, it may overwrite the entire content of your drive or other drives mounted in your PC, probably causing your computer to be unbootable (or change your PC to a SectorLISP machine that always boots SectorLISP, which is cool, but is hard to recover from). Please perform these steps with extra care, and at your own risk.

sudo dd if=sectorlisp.bin of=/dev/[devicename] bs=512 count=1


After you have written your boot drive, insert the drive to the PC you want to boot it from. You may have to change the boot priority settings from the BIOS to make sure the PC boots from the target drive. When the drive boots successfully, you should see a cursor blinking in a blank screen, which indicates that you’re ready to type your Lisp code into bare metal.

We will now discuss the implementation details of this project.

## Training the Neural Network

We first start off by training a neural network on a modern computer using TensorFlow to get its model parameters. The parameters are then converted to 18-bit fixed-point numbers when loading to the SectorLISP program.

### The Dataset

Training Dataset

Test Dataset

The entire dataset for training and testing this neural network is shown above. The input images are 3x5-sized binary monochrome images, which are converted to fixed-point vectors when being provided to the network.

The dataset, as well as the fully connected neural network model, were inspired by a blog post (in Japanese) about pattern recognition using neural networks, written by Koichiro Mori (aidiary).

The upper half is the training dataset that is used to train the neural network. The bottom half is the testing dataset, which is not shown at all to the network at training time, and will be shown for the first time when evaluating the neural network’s performance, to check if the digits for these newly shown images are predicted correctly.

In the final Lisp program, the input image is provided as follows:

(QUOTE
;; input
)
(QUOTE (* * *
* . .
* * *
. . *
* * *))


### The Model

The model for our neural network is very simple. It is a two-layered fully connected network with a ReLU activation function: In TensorFlow, it is written like this:

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(5, 3)),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10),
])


The model and its implementation were referenced from the TensorFlow 2 quickstart for beginners entry from the TensorFlow documentation. As mentioned before, the fully-connected model was also inspired by a blog post (in Japanese) written by Koichiro Mori (aidiary).

This model takes a 3x5 image and outputs a 1x10 vector, where each element represents the log-confidence of each digit from 0 to 9. The final prediction result of the neural network is defined by observing the index that has the largest value in the output 1x10 vector.

Each fully connected neural network contains two trainable tensors A and B, which are the coefficient matrix and the bias vectors, respectively. This network thus consists of 4 model parameter tensors, A_1, B_1, A_2, and B_2, each of size 15x10, 10x1, 10x10, and 10x1, respectively.

The Dropout function is included for inducing generalization and is only activated at training time.

We use the categorical cross-entropy loss and the Adam optimizer for training:

model.compile(optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"])


The model is then trained for 1000 epochs:

model.fit(data_x, data_y_category, epochs=1000, verbose=0)


After training, the model parameters are saved:

model.save("params.h5")


The model parameters A_1, B_1, A_2, and B_2 are contained in this file. Since each model parameter has the sizes 15x10, 10x1, 10x10, and 10x1, the total number of fixed-point numbers is 1620. Since we are using 18 bits for each fixed-point number, the total number of bits for the model parameters of the entire neural network is 29160 bits.

Note that although fixed-point numbers are used in the final Lisp implementation, the training process uses 64-bit floating-point numbers. Since the number of layers and the matrix sizes were both small enough for truncating the precision, we were able to directly convert the trained floating-point model parameter values to fixed-point numbers when loading them into the Lisp implementation.

The training time for the neural network in TensorFlow was 6.5 seconds on a 6GB GTX 1060 GPU.

### Testing for Noise Resistance

The training accuracy was 100%, meaning that all of the 15 images in the training dataset are correctly predicted to the true digit.

The testing accuracy was 85%, meaning that 17 out of 20 newly seen images that were not shown at all during training were predicted correctly.

Here is the confusion matrix for the test dataset: In the case for a 100% accuracy performance, the matrix becomes completely diagonal, meaning that the prediction results always match the ground truth labels. The three off-diagonal elements indicate the 3 prediction errors that occurred at test time.

Here are the 3 images that were not predicted correctly:

Test Dataset Image Prediction 1 3 4

Since all of the other images were predicted correctly, this means that the neural network was able to correctly predict 85% of the unknown data that was never shown at training time. This capability of flexible generalization for newly encountered images is a core feature of neural networks.

## Implementing Neural Networks in Pure Lisp

“Lisp has jokingly been called “the most intelligent way to misuse a computer”. I think that description is a great compliment because it transmits the full flavor of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.” – Edsger W. Dijkstra

Now that we have obtained the model parameters for our neural network, it’s time to build it into pure Lisp.

As explained in the SectorLISP blog post, SectorLISP does not have a built-in feature for integers or floating-point numbers. The only data structures that SectorLISP has are lists and atoms, so we must implement a system for calculating fractional numbers only by manipulating lists of atoms. Our goal is to implement matrix multiplication in fixed-point numbers.

The fixed-point number system used in this project is also available as a SectorLISP library at my numsectorlisp GitHub repo.

### The Number Representations

The number system for this project will be 18-bit fixed-point numbers, with 13 bits for the fractional part, 4 bits for the integer part, and 1 bit for the sign.

Here are some examples of numbers expressed in this fixed-point system:

(QUOTE (0  0 0 0 0  0 0 0 0  0 0 0 0    0 0 0 0 0)) ;; Zero
(QUOTE (0  0 0 0 0  0 0 0 0  0 0 0 0    1 0 0 0 0)) ;; One
(QUOTE (0  0 0 0 0  0 0 0 0  0 0 0 1    0 0 0 0 0)) ;; 0.5
(QUOTE (0  0 0 0 0  0 0 0 0  0 0 1 0    0 0 0 0 0)) ;; 0.25
(QUOTE (0  0 0 0 0  0 0 0 0  0 0 0 0    1 1 1 1 1)) ;; -1
(QUOTE (0  0 0 0 0  0 0 0 0  0 0 0 1    1 1 1 1 1)) ;; -0.5
(QUOTE (0  0 0 0 0  0 0 0 0  0 0 1 1    1 1 1 1 1)) ;; -0.25
(QUOTE (1  1 1 1 1  1 1 1 1  1 1 1 1    1 1 1 1 1)) ;; Negative epsilon (-2^13)
;;     |----------------------------|  |------||-|
;;            Fractional part       Integer part \Sign bit


We first start by making a half adder, which computes single-digit binary addition. A half adder takes the binary single-digit variables A and B and outputs a pair of variables S and C, each representing the sum and the carry flag. The carry C occurs when both input numbers are 1 needing a carry digit for addition. Therefore, it can be written as:

(QUOTE
)
(QUOTE (LAMBDA (X Y)
(COND
((EQ X (QUOTE 1))
(COND
((EQ Y (QUOTE 1)) (CONS (QUOTE 0) (QUOTE 1)))
((QUOTE T) (CONS (QUOTE 1) (QUOTE 0)))))
((QUOTE T)
(CONS Y (QUOTE 0))))))


Next we make a full adder. A full adder also computes single-digit binary addition, except it takes 3 variables including the carry digit, A, B, and C, and outputs the pair S and C for the sum and the carry flag. Including C will help to recursively compute multiple-digit addition in the next section. This can be written as:

(QUOTE
)
(QUOTE (LAMBDA (X Y C)
((LAMBDA (HA1)
((LAMBDA (HA2)
(CONS (CAR HA2)
(COND
((EQ (QUOTE 1) (CDR HA1)) (QUOTE 1))
((QUOTE T) (CDR HA2)))))


Now that we have constructed a full adder, we can recursively connect these full adders to construct a multiple-binary-digit adder. We first start off by constructing an adder for unsigned integers.

Addition is done by first adding the least significant bits, computing the sum and the carry, and then adding the next significant bits as well as the carry flag if it exists. Since the full adder does just this for each digit, we can recursively connect full adders together to make a multiple-digit adder:

(QUOTE
;;            The output binary is in reverse order (the msb is at the end)
;;            The same applies to the entire system
)
(QUOTE (LAMBDA (X Y C)
(COND
((EQ NIL X) Y)
((EQ NIL Y) X)
((QUOTE T)
((LAMBDA (XYC)
(CONS (CAR XYC) (uaddnofc (CDR X) (CDR Y) (CDR XYC))))
(addfull (CAR X) (CAR Y) C))))))


Here, X and Y are multiple-digit numbers such as (QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0)) ;; One, and C is a single-digit carry flag.

This is where the reverse-ordered binary list format becomes useful. Since addition is started by adding the least significant bits first, we can immediately extract this bit just by applying (CAR X) to the numbers.

The u stands for unsigned, addn means the addition of N (arbitrary) digits, of means that overflow is prevented, c means that there is a carry flag as an argument. Since overflow is prevented, this means that the resulting sum may become one digit longer than the original inputs X and Y, instead of overflowing to zero. This is compensated later in other functions.

Finally, to add two unsigned integers X and Y, we wrap uaddnofc with the carry flag initially set to 0, for unsigned integer addition:

(QUOTE
)
(QUOTE (LAMBDA (X Y)


### Unsigned Integer Multiplication

Multiplication can be done similarly with addition, except we add multiple digits instead of one in each step. In multiplication, we recursively shift X one by one bit and add them up, when the corresponding digit in Y is 1. When the digit in Y is 0, we add nothing. Shifting X to the right has the effect of multiplying the number by 2. Note that the shifting effect is reversed since the bit order is reversed.

Following this design, unsigned integer multiplication is implemented as follows:

(QUOTE
;; umultnof : Unsigned N-bit mult
)
(QUOTE (LAMBDA (X Y)
(COND
((EQ NIL Y) u0)
((QUOTE T)
((EQ (QUOTE 0) (CAR Y)) u0)
((QUOTE T) X))
(umultnof (CONS (QUOTE 0) X) (CDR Y)))))))


Now we are ready to start thinking about fixed-point numbers. In fact, we have already implemented unsigned fixed-point addition at this point. This is because of the most significant feature for fixed-point numbers, where addition and subtraction can be implemented exactly the same as signed integers.

This is because fixed-point numbers can be thought of as signed integers with a fixed exponent bias 2^N. Since the fraction part for our system is 13, the exponent bias is 2^(-13) for our system. Therefore, for two numbers A_fix and B_fix, we represent these numbers using an underlying integer A and B, as A_fix == A * 2^(-13), B_fix == B * 2^(-13).

Now, when calculating A_fix + B_fix, the exponent 2^(-13) can be factored out, leaving (A+B)*2^(-13). Therefore, we can directly use unsigned integer addition for unsigned fixed-point addition.

### Unsigned Fixed-Point Multiplication

Multiplication is similar except that the exponent bias changes. For A_fix * B_fix in the previous example, the result becomes (A*B)*2^(-26), with a smaller exponent bias factor. Here, we have a gigantic number A*B compensated with the small exponent bias factor 2^(-26). Therefore, to adjust the exponent bias factor, we can divide A*B by 2^13, and drop the exponent bias factor to 2^(-13). In this case, dividing by 2^13 means to drop 13 of the least significant bits and to keep the rest.

In the case where the output number still has a bit length longer than A and B, this means that the result overflowed and cannot be captured by the number of bits in our system. This is the difference between floating-point numbers. For floating-point numbers, the most significant bit can always be preserved by moving around the decimal point. In fixed-point numbers, on the other hand, large numbers must have their significant bits discarded since the decimal point is fixed. Therefore, it is a little odd to drop the significant bits, but this implementation yields the correct results.

Following this design, we can implement unsigned fixed-point multiplication as follows:

(QUOTE
;; ufixmult : Unsigned fixed-point multiplication
)
(QUOTE (LAMBDA (X Y)
(take u0 (+ u0 (drop fracbitsize (umultnof X Y))))))
(QUOTE


u0 indicates the unsigned integer zero, and fracbitsize is a list of length 13 indicating the fraction part’s bit size.

u0 is added after dropping bits from the multiplication result, since the bit length may be shorter than our system after dropping the bits.

take and drop are defined as follows:

(QUOTE
;; take : Take a list of (len L) atoms from X
)
(QUOTE (LAMBDA (L X)
(COND
((EQ NIL L) NIL)
((QUOTE T) (CONS (CAR X) (take (CDR L) (CDR X)))))))
(QUOTE
;; drop : Drop the first (len L) atoms from X
)
(QUOTE (LAMBDA (L X)
(COND
((EQ NIL X) NIL)
((EQ NIL L) X)
((QUOTE T) (drop (CDR L) (CDR X))))))


### Negation

Now we will start taking the signs of the numbers into account.

In our fixed-point number system, negative numbers are expressed by taking the two’s complement of a number. Negation using two’s complement is best understood as taking the additive inverse of the number in mod (2^13)-1. This yields a very simple implementation for negation:

(QUOTE
;; negate : Two's complement of int
)
(QUOTE (LAMBDA (N)
(take u0 (umultnof N umax))))


Here, umax is a number filled with 1, (QUOTE (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1)). When added by the smallest positive number (QUOTE (1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)), umax overflows to u0 which is filled with 0, meaning the number zero. Since negative numbers are numbers that become zero when added with their absolute value, umax represents the negative number with the smallest absolute value in our fixed-point number system.

Similarly, multiplying by umax yields a number with the same property where the number exactly overflows to zero with only one bit overflowing at the end. Since the addition function in fixed-point numbers is defined exactly the same as unsigned integers, this property means that the output of negate works as negation in fixed-point numbers as well. Therefore, this implementation suffices to implement negation in our number system.

### Signed Fixed-Point Subtraction

At this point, we can define our final operators for + and - used for fixed-point numbers:

(QUOTE
;; +
)
(QUOTE (LAMBDA (X Y)
(take u0 (uaddnof X Y (QUOTE 0)))))
(QUOTE
;; -
)
(QUOTE (LAMBDA (X Y)
(take u0 (uaddnof X (negate Y) (QUOTE 0)))))


Subtraction is implemented by adding the negated version of the second operand.

We will now see how signed multiplication is implemented.

### Signed Fixed-Point Multiplication

Signed fixed-point number multiplication is almost the same as unsigned ones, except that the signs of the numbers have to be managed carefully. Signed multiplication is implemented by reducing the operation to unsigned multiplication by negating the number beforehand if the operand is a negative number, and then negating back the result after multiplication. This simple consideration of signs yields the following implementation:

(QUOTE
;; *
)
(QUOTE (LAMBDA (X Y)
(COND
((< X u0)
(COND
((< Y u0)
(ufixmult (negate X) (negate Y)))
((QUOTE T)
(negate (ufixmult (negate X) Y)))))
((< Y u0)
(negate (ufixmult X (negate Y))))
((QUOTE T)
(ufixmult X Y)))))


### Comparison

Comparison is first done by checking the sign of the numbers. If the signs of both operands are different, we can immediately deduce that one operand is less than another. In the case where the signs are the same for both operands, we subtract the absolute value of each operand and check if the result is less than zero, i.e., it is a negative number.

So we start with a function that checks if a number is negative:

(QUOTE
;; isnegative
)
(QUOTE (LAMBDA (X)
(EQ (QUOTE 1) (CAR (drop (CDR u0) X)))))


This can be done by simply checking if the sign bit at the end is 1, since we have defined to use two’s complement as the representation of negative numbers.

We can then use this to write our algorithm mentioned before:

(QUOTE
;; <
)
(QUOTE (LAMBDA (X Y)
(COND
((isnegative X) (COND
((isnegative Y) (isnegative (- (negate Y) (negate X))))
((QUOTE T) (QUOTE T))))
((QUOTE T) (COND
((isnegative Y) NIL)
((QUOTE T) (isnegative (- X Y))))))))


Comparison in the other direction is done by simply reversing the operands:

(QUOTE
;; >
)
(QUOTE (LAMBDA (X Y)
(< Y X)))


### Division by Powers of Two

Although division for general numbers can be tricky, dividing by powers of two can be done by simply shifting the bits by the exponent of the divisor:

(QUOTE
;; << : Shift X by Y_u bits, where Y_u is in unary.
;;      Note that since the bits are written in reverse order,
;;      this works as division and makes the input number smaller.
)
(QUOTE (LAMBDA (X Y_u)
(+ (drop Y_u X) u0)))


As mentioned in the comment, shifting left becomes division since we are using a reverse order representation for numbers.

### ReLU

At this point, we can actually implement our first neural-network-related function, the rectified linear unit (ReLU). Although having an intimidating name, it is actually identical to numpy’s clip function where certain numbers below a threshold are clipped to the threshold value. For ReLU, the threshold is zero and can be implemented by simply checking the input’s sign and returning zero if it is negative:

 (QUOTE
;; ReLUscal
)
(QUOTE (LAMBDA (X)
(COND
((isnegative X) u0)
((QUOTE T) X))))


ReLUscal takes scalar inputs. This is recursively applied inside ReLUvec which accepts vector inputs.

### Vector Dot Products

At this point, we have finished implementing all of the scalar operations required for constructing a fully-connected neural network! From now on we will write functions for multiple-element objects.

The most simple one is the dot product of two vectors, which can be written by recursively adding the products of the elements of the input vectors:

(QUOTE
;; ================================================================
;; vdot : Vector dot product
)
(QUOTE (LAMBDA (X Y)
(COND
(X (+ (* (CAR X) (CAR Y))
(vdot (CDR X) (CDR Y))))
((QUOTE T) u0))))


Here, vectors are simply expressed as a list of scalars. The vector (1 2 3) can be written as follows:

(QUOTE ((0  0 0 0 0  0 0 0 0  0 0 0 0    1 0 0 0 0)
(0  0 0 0 0  0 0 0 0  0 0 0 0    0 1 0 0 0)
(0  0 0 0 0  0 0 0 0  0 0 0 0    1 1 0 0 0)))


Vector addition works similarly except we construct a list instead of calculating the sum:

(QUOTE
)
(QUOTE (LAMBDA (X Y)
(COND
(X (CONS (+ (CAR X) (CAR Y)) (vecadd (CDR X) (CDR Y))))
((QUOTE T) NIL))))


### Vector-Matrix Multiplication

Surprisingly, we can jump to vector-matrix multiplication right away once we have vector dot products. We first implement matrices as a list of vectors. Since each element in a matrix is a vector, we can write vector-matrix multiplication by recursively iterating over each element of the input matrix:

(QUOTE
;; vecmatmulVAT : vec, mat -> vec : Vector V times transposed matrix A
)
(QUOTE (LAMBDA (V AT)
((LAMBDA (vecmatmulVAThelper)
(vecmatmulVAThelper AT))
(QUOTE (LAMBDA (AT)
(COND
(AT (CONS (vdot V (CAR AT)) (vecmatmulVAThelper (CDR AT))))
((QUOTE T) NIL)))))))


An important property of this function is that the input matrix must be transposed before calculating the correct result. Usually, V @ A where @ is matrix multiplication is defined by multiplying the rows of V with the columns of A. Taking the columns of A is expensive in our Lisp implementation since we have to manage all of the vector elements in A at once in one iteration. On the other hand, if we transpose A before the multiplication, all of the elements in each column become aligned in a single row which can be extracted at once as a single vector element. Since we already have vector-vector multiplication, i.e., vector dot products defined, this way of transposing A beforehand blends in nicely with our function. The name vecmatmulVAT emphasizes this fact by writing AT which means A transposed.

### Matrix-Matrix Multiplication

Using vector-matrix multiplication, matrix-matrix multiplication can be implemented right away, by iterating over the matrix at the first operand:

(QUOTE
;; matmulABT : mat, mat -> mat : Matrix A times transposed matrix B
)
(QUOTE (LAMBDA (A BT)
((LAMBDA (matmulABThelper)
(matmulABThelper A))
(QUOTE (LAMBDA (A)
(COND
(A (CONS (vecmatmulVAT (CAR A) BT) (matmulABThelper (CDR A))))
((QUOTE T) NIL)))))))


Similar to vecmatmulVAT, the second operand matrix B is transposed as BT in this function.

Note that we actually do not use matrix-matrix multiplication in our final neural network, since the first operand is always a flattened vector, and subsequent functions also always yield a vector as well.

### Vector Argmax

Taking the argmax of the vector, i.e., finding the index of the largest value in a vector can simply be implemented by recursive comparison:

(QUOTE
;; vecargmax
)
(QUOTE (LAMBDA (X)
((LAMBDA (vecargmaxhelper)
(vecargmaxhelper (CDR X) (CAR X) () (QUOTE (*))))
(QUOTE (LAMBDA (X curmax maxind curind)
(COND
(X (COND
((< curmax (CAR X)) (vecargmaxhelper
(CDR X)
(CAR X)
curind
(CONS (QUOTE *) curind)))
((QUOTE T) (vecargmaxhelper
(CDR X)
curmax
maxind
(CONS (QUOTE *) curind)))))
((QUOTE T) maxind)))))))


A similar recursive function is img2vec, where the *-. notation for the input image is transformed to ones and zeros:

(QUOTE
;; img2vec
)
(QUOTE (LAMBDA (img)
(COND
(img (CONS (COND
((EQ (CAR img) (QUOTE *)) 1)
((QUOTE T) u0))
(img2vec (CDR img))))
((QUOTE T) NIL))))


Here, the variable 1 is bound to the fixed-point number one in the source code.

### The Neural Network We are finally ready to define our neural network! Following the model, our network can be defined as a chain of functions as follows:

(QUOTE
;; nn
)
(QUOTE (LAMBDA (input)
((LAMBDA (F1 F2 F3 F4 F5 F6 F7 F8)
(F8 (F7 (F6 (F5 (F4 (F3 (F2 (F1 input)))))))))
(QUOTE (LAMBDA (X) (img2vec X)))
(QUOTE (LAMBDA (X) (vecmatmulVAT X A_1_T)))
(QUOTE (LAMBDA (X) (vecadd X B_1)))
(QUOTE (LAMBDA (X) (ReLUvec X)))
(QUOTE (LAMBDA (X) (vecmatmulVAT X A_2_T)))
(QUOTE (LAMBDA (X) (vecadd X B_2)))
(QUOTE (LAMBDA (X) (vecargmax X)))
(QUOTE (LAMBDA (X) (nth X digitlist))))))


This represents a chain of functions from the input to the nth argument of digitlist, which is a list of atoms of the digits, (QUOTE (0 1 2 3 4 5 6 7 8 9)).

Here, A_1_T, B_1, A_2_T, and B_2 are the model parameters obtained from the training section, converted to our fixed-point number system.

## Results

Now let’s try actually running our Lisp neural network! We will use the i8086 emulator Blinkenlights. Instructions for running the program in this emulator is described in the running the neural network on your computer section.

Let’s first try giving the network the following image of the digit 5:

(QUOTE (* * *
* . .
* * *
. . *
* * *))


It turns out like this: The network correctly predicts the digit shown in the image!

Although the original network was trained in an environment where 64-bit floating-point numbers were available, our system of 18-bit fixed-point numbers was also capable of running this network with the same parameters truncated to fit in 18 bits.

### New Unseen Input with Noise

Now let’s try giving another digit:

(QUOTE (* * .
. . *
. * *
* . .
* * *))


Notice that this image is not apparent in the training set, or even in the test dataset. Therefore, the network has never seen this image before, and it is the very first time that it sees this image.

Can the network correctly predict the digit shown in this image? The results were as follows: The network predicted the digit correctly!

Even for images that were never seen before, the neural network was able to learn how to interpret images of digits only by giving some examples of digit images. This is the magic of neural networks!

Therefore, in a way, we have taught a Lisp interpreter that runs on the IBM PC model 5150 what digits are, only by providing example pictures of digits in the process. Of course, the accumulation of knowledge through training the network was done on a modern computer, but that knowledge was handed on to a 512-byte program that is capable of running on vintage hardware.

### Statistics

The training time for the neural network in TensorFlow was 6.5 seconds on a 6GB GTX 1060 GPU. The training was run on a 15-image dataset for 1000 epochs.

Here are the inference times of the neural network run in the emulators:

Emulator Inference Time Runtime Memory Usage
QEMU 4 seconds 64 kiB

The emulation was done on a 2.8 GHz Intel i7 CPU. When run on a 4.77 MHz IBM PC, I believe it should run 590 times slower than in QEMU, which is roughly 40 minutes.

The memory usage including the SectorLISP boot sector program, the S-expression stack for the entire Lisp program, and the RAM used for computing the neural network fits in 64 kiB of memory. This means that this program is capable of running on the original IBM 5150 PC.

## Closing Remarks

It was very fun building a neural network from the bottom up using only first principles of symbolic manipulation. This is what it means for a programming language to be Turing-complete - it can basically do anything that any other modern computers are capable of.

As mentioned at the beginning of this post, Lisp was used as a language for creating advanced artificial intelligence after its birth in 1958. 60 years later in 2018, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun received the Turing Award for establishing the foundations of modern Deep Learning. In a way, using a Turing-complete Lisp interpreter to implement neural networks revisits this history of computer science.

## Credits

The neural network for SectorLISP and its fixed-point number system discussed in this blog post were implemented by Hikaru Ikuta. The SectorLISP project was first started by Justine Tunney and was created by the authors who have contributed to the project, and the authors credited in the original SectorLISP blog post. The i8086 emulator Blinkenlights was created by Justine Tunney. The neural network diagram was created using diagrams.net. The training and testing dataset, as well as the fully connected neural network model, were inspired by a blog post (in Japanese) written by Koichiro Mori (aidiary) from DeNA. The TensorFlow implementation of the model was referenced from the TensorFlow 2 quickstart for beginners entry from the TensorFlow documentation.

]]>
A Lisp Interpreter Implemented in Conway’s Game of Life2022-01-12T18:01:35+09:002022-01-12T18:01:35+09:00https://woodrush.github.io/blog/posts/lisp-in-life Lisp in Life is a Lisp interpreter implemented in Conway’s Game of Life.

The entire pattern is viewable on the browser here.

To the best of my knowledge, this is the first time a high-level programming language was interpreted in Conway’s Game of Life.

## Running Lisp on the Game of Life

Lisp is a language with a simple and elegant design, having an extensive ability to express sophisticated ideas as simple programs. Notably, the powerful feature of macros could be used to modify the language’s syntax to write programs in a highly flexible way. For example, macros can be used to introduce new programming paradigms to the language, as demonstrated in object-oriented-like.lisp (which can actually be evaluated by the interpreter, although complex programs take quite a long time to finish running), where a structure and syntax similar to classes in Object Oriented Programming is constructed. Despite the expressibility of Lisp, it is the world’s second oldest high-level programming language introduced in 1958, only to be preceded by Fortran.

Conway’s Game of Life is a cellular automaton proposed in 1970. Despite it having a very simple set of rules, it is known to be Turing Complete. Lisp in Life demonstrates this fact in a rather straightforward way.

How can simple systems allow human thoughts to be articulated and be expanded? With the expressibility of Lisp and the basis of Conway’s Game of Life, Lisp in Life provides an answer to this question.

### Input and Output

The Lisp program is provided by editing certain cells within the pattern to represent the ASCII-encoding of the Lisp program. The pattern directly reads this text and evaluates the results. You can also load your own Lisp program into the pattern and run it. The standard output is written at the bottom end of the RAM module, which can be easily located and directly examined in a Game of Life viewer. The Lisp implementation supports lexical closures and macros, allowing one to write Lisp programs in a Lisp-like taste, as far as the memory limit allows you to.

The Lisp interpreter is written in C. Using the build system for this project, you can also compile your own C11-compatible C code and run in on Conway’s Game of Life.

### Previous Work

As previously mentioned, to the best of my knowledge, this is the first time a high-level programming language was interpreted in Conway’s Game of Life.

The entry featuring Universal Computers in LifeWiki has a list of computers created in the Game of Life. Two important instances not mentioned in this entry are the Quest For Tetris (QFT) Project created by the authors of the QFT project, and APGsembly created by Adam P. Goucher. All of these work are designed to run an assembly language and are not designed to interpret a high-level language per se.

An example of a compiled high-level language targeted for the Game of Life is Cogol by the QFT project. Cogol is compiled to the assembly language QFTASM, targeted for the QFT architecture. Although Cogol is targeted for the QFT architecture, it requires compilation to QFTASM for the code to be run in the QFT architecture.

In Lisp in Life, a modified version of the QFT architecture is first created for improving the pattern’s runtime. Modifications include introducing a new cascaded storage architecture for the ROM, new opcodes, extending the ROM and RAM address space, etc. The Lisp source code is then written into the computer’s RAM module as its raw binary ASCII format. The Conway’s Game of Life pattern directly reads, parses, and evaluates this Lisp source code to produce its output. This feature of allowing a Conway’s Game of Life pattern to evaluate a high-level programming language expressed as a string of text is a novel feature that was newly achieved in this project.

## Video

Here is a YouTube video showing Lisp in Life in action: ## Screenshots An overview of the entire architecture. An overview of the CPU and its surrounding modules. On the top are the ROM modules, with the lookup module on the right, and the value modules on the left. On the bottom left is the CPU. On the bottom right is the RAM module.

This pattern is the VarLife version of the architecture. VarLife is an 8-state cellular automaton defined in the Quest For Tetris (QFT) Project, which is used as an intermediate layer to create the final Conway’s Game of Life pattern. The colors of the cells indicate the 8 distinct states of the VarLife rule.

The architecture is based on Tetris8.mc in the original QFT repository. Various modifications were made to make the pattern compact, such as introducing a new lookup table architecture for the ROM, removing and adding new opcodes, expanding the ROM and RAM address space, etc. The Conway’s Game of Life version of the architecture, converted from the VarLife pattern. What appears to be a single cell in this image is actually an OTCA metapixel zoomed away to be shown 2048 times smaller. A close-up view of a part of the ROM module in the Conway’s Game of Life version. Each pixel in the previous image is actually this square-shaped structure shown in this image. These structures are OTCA metapixels, which can be seen to be in the On and Off meta-states in this image. The OTCA Metapixel is a special Conway’s Game of Life pattern that can emulate cellular automatons with customized rules. The original VarLife pattern is simulated this way so that it can run in Conway’s Game of Life.

The OTCA Metapixel simulating Life in Life can be seen in this wonderful video by Phillip Bradbury: https://www.youtube.com/watch?v=xP5-iIeKXE8 A video of the RAM module in the VarLife rule in action. The computer showing the results of the following Lisp program:

(define mult (lambda (m n)
(* m n)))

(print (mult 3 14))


The result is 42, shown in binary ascii format (0b110100, 0b110010), read in bottom-to-up order.

As shown in this image, the standard output of the Lisp program gets written at the bottom end of the RAM module, and can be directly viewed in a Game of Life viewer. This repository also contains scripts that run on Golly to decode and view the contents of the output as strings.

## How is it Done? The Lisp interpreter, written in C, is compiled to an assembly language for a CPU architecture implemented in the Game of Life, which is a modification of the computer used in the Quest For Tetris (QFT) project. The compilation is done using an extended version of ELVM (the Esoteric Language Virtual Machine). The Game of Life backend for ELVM was implemented by myself.

Generating a small enough pattern that runs in a reasonable amount of time required a lot of effort. This required optimizations and improvements in every layer of the project; a brief summary would be:

• The C Compiler layer - adding the computed goto feature to the C compiler, preserving variable symbols to be used after compilation, etc.
• The C layer (the Lisp interpreter) - using a string hashtable and binary search for Lisp symbol lookup, minimization of stack region usage with union memory structures, careful memory region map design, etc.
• The QFTASM layer - writing a compiler optimizer to optimize the length of the assembly code
• The VarLife layer (the CPU architecture) - creating a lookup table architecture for faster ROM access, expanding the size and length of the RAM module, adding new opcodes, etc.
• The Game of Life layer - Hashlife-specific optimization

A more detailed description of the optimizations done in this project is available in the Implementation Details section.

### Conversion from VarLife to Conway’s Game of Life

VarLife is an 8-state cellular automaton defined in the Quest For Tetris (QFT) Project. It is used as an intermediate layer to generate the final Conway’s Game of Life pattern; the computer is first created in VarLife, and then converted to a Game of Life pattern.

When converting VarLife to Conway’s Game of Life, each VarLife cell is mapped to an OTCA Metapixel (OTCAMP). The conversion from VarLife to the Game of Life is done in a way so that the behavior of the states of the VarLife pattern matches exactly with the meta-states of the OTCA Metapixels in the converted Game of Life pattern. Therefore, it is enough to verify the behavior of the VarLife pattern to verify the behavior of the Game of Life pattern.

Due to the use of OTCA Metapixels, each VarLife cell becomes extended to a 2048x2048 Game of Life cell, and 1 VarLife generation requires 35328 Game of Life generations. Therefore, the VarLife patterns run significantly faster than the Game of Life (GoL) version.

Additional details on VarLife are available in the Miscellaneous section.

## Pattern Files

Program VarLife Pattern Conway’s Game of Life Pattern
print.lisp QFT_print.mc QFT_print_metafied.mc
lambda.lisp QFT_lambda.mc QFT_lambda_metafied.mc
printquote.lisp QFT_printquote.mc QFT_printquote_metafied.mc
factorial.lisp QFT_factorial.mc QFT_factorial_metafied.mc
z-combinator.lisp QFT_z-combinator.mc QFT_z-combinator_metafied.mc
backquote-splice.lisp QFT_backquote-splice.mc QFT_backquote-splice_metafied.mc
backquote.lisp QFT_backquote.mc QFT_backquote_metafied.mc
object-oriented-like.lisp QFT_object-oriented-like.mc QFT_object-oriented-like_metafied.mc
primes-print.lisp QFT_primes-print.mc QFT_primes-print_metafied.mc
primes.lisp QFT_primes.mc QFT_primes_metafied.mc

Pattern files preloaded with various Lisp programs are available here. Detailed statistics such as the running time and the memory consumption are available in the Running Times and Statistics section.

The patterns can be simulated on the Game of Life simulator Golly.

The VarLife patterns can be simulated on Golly as well. To run the VarLife patterns, open Golly and see File -> Preferences -> Control, and Check the “Your Rules” directory. Open the directory, and copy https://github.com/woodrush/QFT-devkit/blob/main/QFT-devkit/Varlife.rule to the directory.

## Descriptions of the Lisp Programs

• object-oriented-like.lisp: This example creates a structure similar to classes in Object-Oriented Programming, using closures.

• The class has methods and field variables, where each instance carries distinct and persistent memory locations of their own. The example instantiates two counters and concurrently modifies the value held by each instance.
• New syntaxes for instantiation and method access, (new classname) and (. instance methodname), are introduced using macros and functions.

The Lisp interpreter’s variable scope and the macro feature is powerful enough to manage complex memory management, and even providing a new syntax to support the target paradigm.

• printquote.lisp: A simple demonstration of macros.

• factorial.lisp: A simple demonstration of recursion with the factorial function.

• z-combinator.lisp: Demonstration of the Z Combinator to implement a factorial function using anonymous recursion.

• backquote-splice.lisp: Implements the backquote macro used commonly in Lisp to construct macros. It also supports the unquote and unquote-splice operations, each written as ~ and ~@.

• primes.lisp: Prints a list of prime numbers up to 20. This example highlights the use of the while syntax.

The contents of print.lisp is quite straightforward - it calculates and prints the result of 3 * 14. backquote.lisp and primes-print.lisp are similar to backquote-splice.lisp and primes.lisp, mainly included for performance comparisons. backquote.lisp doesn’t implement the unquote-splice operation, and demonstrates some more examples. primes-print.lisp reduces the number of list operations to save memory usage.

## Details of the Lisp Interpreter

### Special Forms and Builtin Functions

• define
• if
• quote
• car, cdr
• cons
• list
• atom
• print
• progn
• while
• lambda, macro
• eval
• eq
• +, -, *, /, mod, <, >

### Lexical Closures

This Lisp interpreter supports lexical closures. The implementation of lexical closures is powerful enough to write an object-oriented-like code as shown in object-oriented-like.lisp, where classes are represented as lexical closures over the field variables and the class methods.

### Macros

This Lisp interpreter supports macros. Lisp macros can be thought as a function that receives code and returns code. Following this design, macros are treated exacly the same as lambdas, except that it takes the arguments as raw S-expressions, and evaluates the result twice (the first time to build the expression, and the second time to actually evaluate the builded expression).

## Running Times and Statistics

VarLife Patterns

Lisp Program and Pattern (VarLife) #Halting Generations (VarLife) Running Time (VarLife) Memory Usage (VarLife)
print.lisp [pattern] 105,413,068 (exact) 1.159 mins 5.0 GiB
lambda.lisp [pattern] 700,000,000 2.966 mins 12.5 GiB
printquote.lisp [pattern] 800,000,000 3.424 mins 12.5 GiB
factorial.lisp [pattern] 1,000,000,000 5.200 mins 17.9 GiB
z-combinator.lisp [pattern] 1,700,000,000 9.823 mins 23.4 GiB
backquote-splice.lisp [pattern] 4,100,000,000 20.467 mins 27.5 GiB (max.)
backquote.lisp [pattern] 4,100,000,000 21.663 mins 27.5 GiB (max.)
object-oriented-like.lisp [pattern] 4,673,000,000 22.363 mins 27.5 GiB (max.)
primes-print.lisp [pattern] 8,880,000,000 27.543 mins 27.5 GiB (max.)
primes.lisp [pattern] 9,607,100,000 38.334 mins 27.5 GiB (max.)

Conway’s Game of Life (GoL) Patterns

Lisp Program and Pattern (GoL) #Halting Generations (GoL) Running Time (GoL) Memory Usage (GoL)
print.lisp [pattern] 3,724,032,866,304 382.415 mins 27.5 GiB (max.)
lambda.lisp [pattern] 24,729,600,000,000 1372.985 mins 27.5 GiB (max.)
printquote.lisp [pattern] 28,262,400,000,000 1938.455 mins 27.5 GiB (max.)
factorial.lisp [pattern] 35,328,000,000,000 3395.371 mins 27.5 GiB (max.)
z-combinator.lisp [pattern] 60,057,600,000,000 - -
backquote-splice.lisp [pattern] 144,844,800,000,000 - -
backquote.lisp [pattern] 144,844,800,000,000 - -
object-oriented-like.lisp [pattern] 165,087,744,000,000 - -
primes-print.lisp [pattern] 313,712,640,000,000 - -
primes.lisp [pattern] 339,399,628,800,000 - -

Common Statistics

Lisp Program #QFT CPU Cycles QFT RAM Usage (Words)
print.lisp 4,425 92
lambda.lisp 13,814 227
printquote.lisp 18,730 271
factorial.lisp 28,623 371
z-combinator.lisp 58,883 544
backquote-splice.lisp 142,353 869
backquote.lisp 142,742 876
object-oriented-like.lisp 161,843 838
primes-print.lisp 281,883 527
primes.lisp 304,964 943

The running times for each program are shown above. The Hashlife algorithm used for the simulation requires a lot of memory in exchange of speedups. The simulations were run on a 32GB-RAM computer, with Golly’s memory usage limit set to 28000 MB, and the default base step to 2 (configurable from the preferences). The memory usage was measured by Ubuntu’s activity monitor. “(max.)” shows where the maximum permitted memory was used. The number of CPU cycles and the QFT memory usage was obtained by running the QFTASM interpreter on the host PC. The QFT memory usage shows the number of RAM addresses that were written at least once. The memory usage is measured in words, which is 16 bits in this architecture.

All of the VarLife patterns can actually be run on a computer. The shortest running time is about 1 minute for print.lisp. A sophisticated program such as object-oriented-like.lisp can even run in about 22 minutes.

On the other hand, the Game of Life patterns take significantly more time than the VarLife patterns, but for short programs it can be run in a moderately reasonable amount of time. For example, print.lisp finishes running in about 6 hours in the Game of Life pattern. As mentioned in the “Conversion from VarLife to Conway’s Game of Life” section, since the Game of Life pattern emulates the behavior of the VarLife pattern using OTCA Metapixels, the behavior of the Game of Life patterns can be verified by running the VarLife patterns.

## Tests

There are tests to check the behavior of the Lisp interpreter. There is a test for checking the QFTASM-compiled Lisp interpreter using the QFTASM interpreter, and a test for checking the GCC-compiled Lisp interpreter on the host pc. To run these tests, use the following commands:

git submodule update --init --recursive # Required for building the source

make test             # Run the tests for the QFTASM-compiled Lisp interpreter, using the QFTASM interpreter
make test_executable  # Run the tests for the executable compiled by GCC


Running make test requires Hy, a Clojure-like Lisp implemented in Python available via pip install hy. Some of the tests compare the output results of Hy and the output of the QFTASM Lisp interpreter.

The tests were run on Ubuntu and Mac.

## Building from Source

This section explains how to load the Lisp interpreter (written in C) to the Game of Life pattern, and also how to load a custom Lisp program into the pattern to run it on Game of Life.

Please see build.md from the GitHub repository.

## Implementation Details

This section describes the implementation details for the various optimizations for the QFT assembly and the resulting Game of Life pattern.

### The C Compiler layer

• Added the computed goto feature to ELVM
• This was merged into the original ELVM project.
• Modified the compiler to preserve and output memory address symbols and program address symbols, for their usage in the compiler optimization tool in the QFTASM layer
• This allows to use memheader.eir, so that symbols used in the C source can be referenced in the ELVM assembly layer using the same variable symbols.

### The ELVM Assembly layer

• Wrote the QFTASM backend for ELVM
• This was merged into the original ELVM project.
• Added further improvements to the QFTASM backend:
• Let the ELVM assembly’s memory address space match QFT’s native memory address space
• Originally, the ELVM assembly had to convert its memory address every time when a memory access occurs.
• Support new opcodes added in the improved QFT architecture

### The C layer (the implementation of the Lisp interpreter)

#### Usage of binary search and hashtables for string representations and comparisons

By profiling the GCC-compiled version of the Lisp interpreter, it was found that the string table lookup process was a large performance bottleneck. This was a large room for optimization.

The optimized string lookup process is as follows. First, when the Lisp parser accepts a symbol token, it creates a 4-bit hash of the string with the checksum of the ASCII representation of the string. The hash points to a hashtable that holds the root of a binary search tree for string comparison. Each node in the tree holds the string of the symbol token, and two nodes that are before and after the token in alphabetical order. When a query symbol token arrives in the parsing phase, a node with a matching token is returned, or a new node for the token is added into this binary tree if the token does not exist yet. This allows for each distinct symbol in the S-expression to have a distinct memory address.

In the interpretation phase, since each distinct symbol has a distinct memory address, and every string required for the Lisp program has already been parsed, string comparison can be done by simply comparing the memory address of the tokens. Since the interpreter only uses string equality operations for string comparison, simply checking for integer equality suffices for string comparison, speeding up the interpretation phase. Since the hash key is 4 bits long, this allows for reducing 4 searches in the binary tree compared to using a single binary tree.

#### Usage of jump hash tables for the special form evaluation procedure searches

There are 17 distinct procedures for evaluating the special forms in the Lisp interpreter, define, if, quote, car, cdr, cons, atom, print, progn, while, {lambda, macro}, eval, eq, {+, -, *, /, mod}, {<, >}, list, and lambda/macro invocations (when if the token is not a special form). Using an if statement to find the corresponding procedure for a given token becomes a linear search for the token comparisons. To speed up this search process, a hash table is created for jumping to the corresponding procedures. Since the memory addresses for the special forms can be determined before parsing the Lisp program, all of the symbols for the special forms have a fixed memory address. Therefore, the hash key can be created by subtracting an offset to the symbol’s memory address, to point to a hashtable that is created near the register locations. This hashtable is provided in memheader.eir. When the hash key is larger than the regions of this hashtable, it means that the symbol is not a special form, so the evaluation jumps to the lambda/macro invocation procedure.

#### Usage of 2-bit headers to represent value types

The Lisp implementation has 3 distinct value types, ATOM, INT, and LAMBDA. Each value only consumes one QFT byte of memory; the ATOM value holds the pointer to the symbol’s string hashtable, the INT value holds the signed integer value, and LAMBDA holds a pointer to the Lambda struct, as well as its subtype information, of either LAMBDA, MACRO, TEMPLAMBDA and TEMPMACRO. (The TEMPLAMBDA and TEMPMACRO subtypes are lambda and macro types that recycles its argument value memory space every time it is called, but is unused in the final lisp programs.) Since the RAM’s address space is only 10 bits, there are 6 free bits that can be used for addresses holding pointers. Therefore, the value type and subtype information is held in these free bits. This makes the integer in the Lisp implementation to be a 14-bit signed integer, ranging from -8192 to 8191.

#### Minimization of Stack Region Usage

Since the C compiler used in this project does not have memory optimization features, this has to be done manually within the C source code. This led to the largest reason why the interpreter’s source code seems to be obfuscated.

One of the largest bottlenecks for memory access was stack region usage. Every time a stack region memory access occurs, the assembly code performs memory address offset operations to access the stack region. This does not happen when accessing the heap memory, since there is only one heap region used in the entire program, so the pointers for global variables can be hard-coded by the assembler. Therefore, it is favorable optimization-wise to use the heap memory as much as possible.

One way to make use of this fact is to use as much global variables as possible. Since registers and common RAM memory share the same memory space, global variables can be accessed with a speed comparable to registers (However, since the physical location of the RAM memory slot within the pattern affects the I/O signal arrival time, and the registers have the most smallest RAM addresses, i.e. they are the closest to the CPU unit, the registers have the fastest memory access time).

Another method of saving memory was to use union memory structures to minimize the stack region usage. In the C compiler used in this project, every time a new variable is introduced in a function, the function’s stack region usage (used per call) is increased to fit all of the variables. This happens even when two variables never appear at the same time. Therefore, using the fact that some variables never appear simultaneously, unions are used for every occurence of such variables, so that they can use a shared region within the stack space. This led to minimization of the stack region usage. Since the stack region is only 233 hextets (1 byte in the QFT RAM is 16 bits) large, this allowed to increase the number of nested function calls, especially the nested calls of eval which evaluates the S-expressions. Since the S-expressions have a list structure, and eval becomes nested when lambdas are called in the Lisp program, this optimization was significant for allowing more sophisticated Lisp programs to be run in the architecture.

### The QFTASM layer

The QFT assembly generated by the C compiler has a lot of room for optimization. I therefore created a compiler optimization tool to reduce the QFTASM assembly size.

#### Constant folding

Immediate constant expressions such as ADD 1 2 destination is folded to a MOV operation.

#### MOV folding

The QFT assembly code can be splitted into subregions by jump operations, such that:

• Each subregion doesn’t contain any jump operations
• Each subregion ends with a jump operation
• Every jump operation in the assembly is guaranteed to jump to the beginning of a subregion, and never to the middle of any subregion

The last guarantee where jumps never occur in the middle of a subregion is provided by the C compiler. The ELVM assembly’s program counter is designed so that it increases only when a jump instruction appears. This makes an ELVM program counter to point to a sequence of multiple instructions, instead of a single instruction. Since the ELVM assembly uses the ELVM program counter for its jump instructions, it is guaranteed that the jump instructions in the QFT assembly never jump to the middle of any subregion, and always jumps to a beginning of a subregion.

In each subregion, the dependency graph for the memory address is created. If a memory address becomes written but is later overwritten without becoming used in that subregion at all, the instruction to write to that memory address is removed. Since it is guaranteed that jump operations never jump to the middle of any subregion, it is guaranteed that the overwritten values can be safely removed without affecting the outcome of the program. The MOV folding optimization makes use of this fact to remove unnecessary instructions.

This folding process is also done with dereferences; if a dereferenced memory address is written, and the address is overwritten without being used at all, and the dereference source is unoverwritten at all during this process, the instruction for writingto the dereferenced memory address is removed.

#### Jump folding

If the destination of a conditional or fixed-destination jump instruction points to another jump instruction with a fixed destination, the jump destination is folded to the latter jump instruction’s destination.

A similar folding is done when a fixed jump instruction points to a conditional jump instruction, where the fixed jump instruction is replaced by the latter conditional jump instruction.

### The Varlife layer (the computer architecture)

#### Created the with a lookup table structure for the ROM module

In this image of the CPU and its surrounding modules, the two modules on the top are the ROM modules. The original ROM module had one table, with the memory address as the key and the instruction as the value. I recreated the ROM module to add a lookup table layer, where each distinct instruction (not the opcodes, but the entire instruction including the values used within) holds a distinct serial integer key. The ROM module on the right accepts a program counter address and returns the instruction key for the program counter. The module on the left accepts the instruction key and returns the actual bits of the instruction as the output. This allows for dictionary compression to be performed to the ROM data, saving a lot of space. Since the instructions are 45 bits and the instruction keys are only 10 bits, the instruction key table is 1/4 the size of the original ROM module. Although the ROM size is 3223 for the entire Lisp interpreter, there were only 616 distinct instructions in the Lisp interpreter, making the size of the instruction table be 616 ROM units high, effectively reducing the ROM module size altogether.

The ROM module features another form of compression, where absence of cells are used to represent 0-valued bits within the instruction. Below is a close-up look of the ROM value module: Notice that some cells on the left are absent, despite the table being expected to be a rectangular shape. This is because absent cells do not emit any signals, hence effectively emitting 0-valued bits as the output. To use this fact, all of the instructions are first alphabetically ordered at table creation time, so that instructions that start with trailing zeroes become located higher in the table (further from the signal source). This allows for a maximum number of cells to be replaced with absent units to represent 0-valued bits. In fact, the instruction for no-ops is represented as all zeroes, so all of the units in the value module are replaced by absent cells. The no-op instruction appears a lot of times immediately after the jump operation, due to the fact that the QFT architecture has a branch delay when invoking a jump instruction, requiring a no-op instruction to be present to compensate for the delay.

#### Added new optimized instructions to the ALU, and removed unused ones

I removed the AND, OR, SL (shift left), SRL (shift right logical), and the SRA (shift right arithmetical) opcodes, and added the SRU (shift right unit) and SRE (shift right eight) opcodes to the architecture. Since there already were opcodes for XOR (bitwise-xor) and ANT (bitwise-and-not), AND and OR, which were not used much in the interpreter, could be replaced by these opcodes. The bitshift operations had significantly larger patterns than the other opcodes, being more than 10 times larger than the other opcodes. These were reduced to a fixed-size shift operations which could be implemented in the same sizes as the other opcodes. Since the shift left opcode can be replaced by consecutively adding its own value, effectively multiplying by powers of 2, the opcode was safely removed. The main reason for the original bitshift units being large was due to the shift amounts being dependent on the values of the RAM. Converting a binary value to a physical (in-pattern) shift amount required a large pattern. On the other hand, shifting a fixed value could be implemented by a significantly more simpler pattern. The shift right eight instruction is mainly used for reading out the standard input, where each ASCII character in the input string is packed into one 16-bit RAM memory address.

This resulted in a total of exactly 8 opcodes, ANT, XOR, SRE, SRU, SUB, ADD, MLZ, and MNZ. Since this can fit in 3 bits, the opcode region for the instruction value was reduced by 1 bit. Since the RAM module is 10 bits, and the third value of the instruction is always the writing destination of the RAM, and the first instruction can also be made so that it becomes the reading source address of the RAM, this allows for an additional 6*2=12 bits to be reduced from the instruction length. These altogether has reduced the ROM word size from 58 to 45 bits, reducing nearly 1/4 of the original instruction size.

#### Extended the ROM and RAM address space from 9,7-bit to 12,10-bit

The original QFT architecture had a ROM and RAM address space of 9 and 7 bits. I extended the ROM and RAM address space to 12 and 10 bits, respectively. This was not a straightforward task as it first seemed, since the signal arrival timings between the modules had to be carefully adjusted in order for the signals to line up correctly. This involved reverse-engineering and experimenting undocumented VarLife pattern units used in the original QFT architecture. The same held for when redesigning other parts of the architecture.

#### Reducing the Standard Input Size

Since each byte of the RAM module can be ordered arbitrarily in the CPU’s architecture, the RAM is arranged so that the standard output is written at the very bottom of the RAM module, and proceeds upwards. Therefore, the contents of the RAM can easily be observed in a Game of Life viewer by directly examining the bottom of the RAM module.

Since RAM has 16 bits of memory per memory address, it allows to fit two ASCII-encoded characters per one address. Therefore, the standard input is read out by reading two characters per address. For the standard output, one character is written to one address for aesthetic reasons, so that the characters can be directly observed in a Game of Life viewer the pattern more easily. Also, for the standard output to proceed upwards within the RAM module pattern, the memory pointer for the standard output proceeds backwards in the memory space, while the pointer for the standard input proceeds forwards in the memory space.

### The Game of Life layer

Optimizing the Game of Life layer mainly revolved around understanding the Macrocell format for representing and saving Game of Life patterns, and the Hashlife algorithm. The Macrocell format uses quadtrees and memoization for compressing repeated patterns. Since the final Game of Life pattern is an array of OTCA metapixels which are 2048x2048 large, and even has repeated patterns in the VarLife layer (meaning that there are repeated configurations of OTCA metapixels), this compression reduces the file size for the QFT pattern significantly. The best example that let me understand the Macrocell format was an example provided by Adam P. Goucher in this thread in Golly’s mailing list.

The Hashlife algorithm also uses quadtrees and memoization to speed up the Game of Life simulations. This algorithm makes use of the fact that the same pattern in a same time frame influences only a fixed extent of its surrounding regions, hence allowing for memoization.

As for optimization, I first noticed that the QFT pattern had a 1-pixel high pattern concatenated to the entire pattern. The original QFT pattern in the original QFT repository was carefully designed so that it is composed of 8x8-sized pattern units. Therefore, most of the patterns can be represented by 8x8 tiles. However, since the 1-pixel high pattern at the top creates an offset that shifts away the pattern from this 8x8 grid, it causes the pattern to have fewer repeated patterns if interpreted from the corner of its bounding box, causing the memoization to work inefficiently. I therefore tried putting a redundant cell (which does not interfere with the rest of the pattern) to realign the entire pattern to its 8x8 grid, which actually slightly reduced the resulting Macrocell file size from the original one. Although I didn’t compare the running times, since the Hashlife algorithm uses memoization over repeated patterns as well, I expect this optimization to at least slightly contribute to the performance of the simulation.

Another optimization was improving the metafier script used to convert VarLife patterns to Game of Life (MetafierV3.py). The original script used a square region to fit the entire pattern to create the quadtree representation. However, since the Lisp in Life VarLife pattern is 968 pixels wide but 42354 pixels high, it tried to allocate a 65536x65536-sized integer array, which was prohibitively large to run. I modified the script so that it uses a rectangular region, where absent regions of the quadtree are represented as absent cells. Although this is very straightforward with the knowledge of the Macrocell format, it was difficult at first until I became fond of the algorithms surrounding the Game of Life.

### Memory Region Map and the Phases of Operation The memory region map is carefully designed to save space. This is best described with the operation phases of the interpreter.

#### Phase 0: Precalculations

Various precalculations are done after the interpreter starts running. The construction of the string interning hashtable for reserved atoms such as define, quote, etc. are done in this phase. For the GCC-compiled interpreter, some variables that are defined in the QFT memory header are defined in the C source.

Since the outcome of these precalculations are always the same for any incoming Lisp program, this phase is done on the host PC, and the results are saved as ramdump.csv during the QFTASM compile time. The results are then pre-loaded into the RAM when the VarLife and Game of Life patterns are created. This allows to saves some CPU cycles when running the interpreter.

As explained earlier, the QFT architecture holds register values in the RAM. There are 11 registers, which are placed in the addresses from 0 to 10.

The reserved values in the image include strings such as reserved atoms and the destinations of the jump hashtable used for evaluation. The rest of the region is used for storing global variables in the interpreter’s C source code.

#### Phase 1: Parsing

The Lisp program provided from the standard input is parsed into S-expressions, which is written into the heap region.

Notice that the string interning hashtables are created in the later end of the stack region. This is because these hashtables are only used during the parsing phase, and can be overwritten during the evaluation phase. For most Lisp programs including the ones in this repository, the stack region does not grow far enough to overwrite these values. This allows to place 3 growing memory regions during the parsing phase, the stack region used for nested S-expressions, the heap region which stores the parsed S-expressions, and the string interning hashtables when new strings are detected within the Lisp program. Newly detected strings such as variable names in the Lisp program are also written into the heap region.

The heap region is also designed so that it overwrites the standard input as it parses the program. Since older parts of the program can be discarded once it is parsed, this allows to naturally free the standard input region which save a lot of space after parsing. The standard input also gets overwritten by the Standard output if the output is long enough. However, due to this design, long programs may have trouble at parsing, since the input may be overwritten too far and get deleted before it is parsed. A workaround for this is to use indentation which places the program further ahead into the memory, which will prevent the program from being overwritten from the growing heap region. For all of the programs included in this repository, this is not an issue and the programs become successfully parsed.

#### Phase 2: Evaluation

By this time, all of the contents of the stack region and what is ahead of the head of the heap region can be overwritten in the further steps. Note that a similar issue with the standard input happens with the standard output - when too many Lisp objects are created during runtime, it may overwrite the existing standard output, or may simply exceed the heap region and proceed into the stack region. Since the heap region is connected to the later end of the stack region, this may be safe if the standard output is carefully handled, but the interpreter will eventually start overwriting values of the stack region if the heap continues to grow.

### Miscellaneous

#### How can a 2-state OTCA Metapixel emulate the behavior of an 8-state VarLife pattern?

This is one of the most interesting ideas in the original QFT project to make the QFT architecture possible. As explained in the original QFT post, the 8 states of VarLife are actually a mixture of 4 different birth/survival rules with binary states. This means that each VarLife cell can only transition between two fixed states, and the birth/survival rule for that cell does not change at any point in time. Moreover, the OTCA Metapixel is designed so that each metapixel can carry its own birth/survival rules. Therefore, each VarLife cell can be enoded into an OTCA Metapixel by specifying its birth/survival rule and the binary state. This means that the array of OTCA Metapixels in the metafied pattern is actually a mixture of metapixels with different birth/survival rules, arranged in a way so that it makes the computation possible.

#### Halting Time

After the program counter is set to 65535 and the program exits, no more ROM and RAM I/O signals become apparent in the entire module. This makes the VarLife pattern becomes completely stationary, where every pattern henceforth becomes completely identical. Defining this as the halting time for the calculation, the pattern for print.lisp halts at exactly 105,413,068 VarLife generations.

The halting time for the Game of Life patterns are defined similarly for the meta-states of the OTCA Metapixels. Since OTCA Metapixels never become stationary, the Game of Life states do not become stationary after the halting time, but the meta-states of the OTCA Metapixels will become stationary after the halting time.

For the VarLife pattern of print.lisp, by generation 105,387,540, the value 65535 gets written to the program counter. At generation 105,413,067, the last signal becomes just one step from disappearing, and at generation 105,413,068 and onwards, the pattern becomes completely stationary and every pattern becomes identical to each other. In the Game of Life version, since the OTCA Metapixel continues running indefinitely, the pattern does not become completly stationary, but the meta-states of the OTCA Metapixels will become completely stationary, since it is an emulation of the VarLife pattern. Note that the halting times for programs other than print.lisp is just a sufficient number of generations, and not the exact values.

The required number of generations per CPU cycle depends on many factors such as the ROM and RAM addresses and the types of opcodes, since the arriving times of the I/O signals depend on factors such as these as well. This makes the number of generations required for the program to halt become different between each program. For example, print.lisp has a rate of 23822.16 generations per CPU cycle (GpC), but z-combinator.lisp has a rate of 28870.81 GpC, and primes-print.lisp has 31502.43 GpC. 23822.16 GpC is in fact insufficient for z-combinator.lisp to finish running, and 28870.81 is also insufficient for primes-print.lisp to finish running.

#### Miscellaneous Screenshots The ALU unit in the CPU. From the left are the modules for the ANT, XOR, SRE, SRU, SUB, ADD, MLZ, and the MNZ opcodes.

The SRE and the SRU opcodes were newly added for this project.

## Credits

The CPU architecture used in this project was originally created by the members of the Quest For Tetris (QFT) project, and was later optimized and modified by Hikaru Ikuta for the Lisp in Life project. The VarLife cellular automaton rule was also defined by the members of the QFT project. The metafier for converting VarLife patterns to Conway’s Game of Life patterns was written by the members of the QFT project, and was later modified by Hikaru Ikuta to support the pattern size of the Lisp in Life architecture. The assembly language for the QFT architecture, QFTASM, was also originally designed by the members of the QFT project, and was later modified by Hikaru Ikuta for this project for achieving a feasible running time. The Lisp interpreter was written by Hikaru Ikuta. The compilation of the interpreter’s C source code to the ELVM assembly is done using an extended version of 8cc written by Rui Ueyama from Google. The compilation from the ELVM assembly to QFTASM is done by an extended version of ELVM (the Esoteric Language Virtual Machine), a project by Shinichiro Hamaji from Preferred Networks, Inc. The Game of Life backend for ELVM was written by Hikaru Ikuta, and was later further extended by Hikaru for the Lisp in Life project.

]]>
Extending SectorLISP to Implement BASIC REPLs and Games2022-01-12T11:01:35+09:002022-01-12T11:01:35+09:00https://woodrush.github.io/blog/posts/sectorlisp-io SectorLISP is an amazing project where a fully functional Lisp interpreter is fit into the 512 bytes of the boot sector of a floppy disk. Since it works as a boot sector program, the binary can be written to a disk to be used as a boot drive, where the computer presents an interface for writing and evaluating Lisp programs, all running in the booting phase of bare metal on the 436-byte program. As it hosts the Turing-Complete language of Lisp, I was in fact able to write a BASIC interpreter in 120 lines of SectorLISP code, which evaluates BASIC programs embedded as an expression within the Lisp code, shown in the screenshot above.

When I first saw SectorLISP and got it to actually run on my machine, I was struck with awe by how such a minimal amount of machine code could be used to open up the vast ability to host an entire programming language. You can write clearly readable programs which the interpreter will accurately evaluate to the correct result. I find it beautiful how such a small program is capable of interpreting a form of human thought and generating a sensible response that contains the meaning encapsulated in the inquired statement.

## The Issue - Designing Interactions

After writing various programs for SectorLISP, there was a particular thought that came into my mind. Even after writing the BASIC interpreter, I felt that there was one very important feature that could significantly enhance the capabilities of SectorLISP - that is, the ability to accept feedback from the user depending on the program’s output, by designing the interaction between the user and the computer.

The prime example of this is games. Games are possible to be played on a computer since the player can react depending on the output of the computer. Of course, even with pure functions as in SectorLISP, it’s still possible to create a game if we make the user of the program run the same program again every time the program demands a new input. The entire history of user inputs can be expressed as a certain list in the program, and the input and output states can be passed through the course of the entire program, and the program can stop whenever a required input is not apparent, also showing its accumulated outputs. However, such an interface that requires repeated inputs is rather inconvenient for the user, inconvenient in the same sense that IF is inconvenient than COND, and how lambdas that can take only one argument are inconvenient than lambdas that can take any number of arguments, both being used to make the experience of the humans interacting with SectorLISP as simple and natural as possible.

When you think about it, the reason why computers are such a powerful device used almost everywhere in our lives today, is because they can be redesigned into an entirely different tool for an arbitrary purpose. The computer is then no longer a tool that is used only by the programmer, but can be used by anybody to run its applications. The transition from ENIAC to the dawn of the personal computing era was possible since computers became capable of general tasks other than computing equations, such as writing and saving documents for a business. Today, computers are being used for creating artwork, for playing games, for communicating with others, to only give a few examples. The entire history of computers is shaped by what new tasks computers became capable of, which is inseparable from the means of interaction between the human and the computer.

At the heart of the diverse applications for computers is the language used to program them. This is why programming languages capable of designing interactions are special - once a computer is programmed, it can leave the hands of the programmer and lie in the hands of the user, who interacts with it in a newly designed way.

As a matter of fact, all of the other languages mentioned in the SectorLISP blog post support an I/O functionality. SectorFORTH has the key and emit instructions which reads a keystroke from the user and prints a character to the console. BootBasic has the instructions input and print where input stores a user input to a variable. Even BF has the instructions , and . capable of designing arbitrary user text input and output. @rdebath has in fact made a text adventure game written entirely in BF.

Although the goal of SectorLISP is set in the realm of pure functions, I thought that it would be a massive gain if it were able to handle I/O and still have a smaller program size than the other languages mentioned in the SectorLISP blog post. In the context of comparing the binary footprint of programs, it would be a better comparison if all of the programs under discussion had even more functionalities in common. All of this could be achieved if we could construct a version of SectorLISP that is capable of handling user input and outputs that still has a small program size.

## The Solution

What could we do to empower SectorLISP with the puzzle piece of interaction? What is a natural way of implementing I/O? To answer this, I created a fork of SectorLISP that supports two new special forms, READ and PRINT. These two special forms are the counterparts for the , and . instructions in BF. READ accepts an arbitrary S-Expression from the user, and PRINT prints the value of the evaluated argument to the console. PRINT also prints a newline when called with no arguments as (PRINT).

The fork is available here: https://github.com/woodrush/sectorlisp/tree/io

Update (2022/4/6): The fork was merged into the original SectorLISP repository. Thanks for reviewing and merging it!

Adding all of these features only amounted to an extra 35 bytes of the binary, with a total of 469 bytes, or 471 bytes including the boot signature. This is still 22 bytes or more smaller than the two former champions of minimal languages that fit in a boot sector mentioned in the SectorLISP blog post, which are SectorFORTH (491 bytes) and BootBasic (510 bytes). The rather minimal increase was achievable since most of the code for handling input and output were already available from the REPL’s functionality. This fork successfully shows that adding an I/O feature to SectorLISP will still allow it to have a smaller binary footprint than the two former champions.

Update: Thanks to a pull request by @jart, the author of the original SectorLISP, we’re down to 465 bytes or 467 bytes including the boot signature. Thank you @jart for your contribution! The details of the assembly optimizations including the one used in this pull request are discussed in the Assembly Optimizations section.

## Usage

To run the SectorLISP fork, first git clone and make SectorLISP’s binary, sectorlisp.bin:

git clone https://github.com/woodrush/sectorlisp
cd sectorlisp
git checkout io
make


Update (2022/4/6): Since the fork was merged into the original SectorLISP repository, you could now checkout https://github.com/jart/sectorlisp instead for using these features.

This will generate sectorlisp.bin under ./sectorlisp.

curl https://justine.lol/blinkenlights/blinkenlights-latest.com >blinkenlights.com


You can then run SectorLISP by running:

./blinkenlights.com -rt sectorlisp.bin


In some cases in Ubuntu, there might be a graphics-related error showing and the emulator may not start. In that case, run the following command first available on the download page:

sudo sh -c "echo ':APE:M::MZqFpD::/bin/sh:' >/proc/sys/fs/binfmt_misc/register"


After starting Blinkenlights, expand the size of your terminal large enough so that the TELETYPEWRITER region shows up at the center of the screen. This region is the console used for input and output. Then, press c to run the emulator in continuous mode. The cursor in the TELETYPEWRITER region should move one line down. You can then start typing in text or paste a long code from your terminal into Blinkenlight’s console to run your Lisp program.

### Running on Physical Hardware

You can also run SectorLISP on an actual physical machine if you have a PC with an Intel CPU that boots with a BIOS, and a drive such as a USB drive or a floppy disk that can be used as a boot drive. First, mount your drive to the PC you’ve built sectorlisp.bin on, and check:

lsblk -o KNAME,TYPE,SIZE,MODEL


Among the list of the hardware, check for the device name for your drive you want to write SectorLISP onto. After making sure of the device name, run the following command, replacing [devicename] with your device name. [devicename] should be values such as sda or sdb, depending on your setup.

Caution: The following command used for writing to the drive will overwrite anything that exists in the target drive’s boot sector, so it’s important to make sure which drive you’re writing into. If the command or the device name is wrong, it may overwrite the entire content of your drive or other drives mounted in your PC, probably causing your computer to be unbootable. Please perform these steps with extra care, and at your own risk.

sudo dd if=sectorlisp.bin of=/dev/[devicename] bs=512 count=1


After you have written your boot drive, insert the drive to the PC you want to boot it from. You may have to change the boot priority settings from the BIOS to make sure the PC boots from the target drive. When the drive boots successfully, you should see a cursor blinking in a blank screen, which indicates that you’re ready to type your Lisp code into bare metal.

## Applications

Here we present examples to showcase the capabilities of READ and PRINT.

### Games

A major example of interactive programs is games. I created a simple number guessing game that works on the fork of SectorLISP.

Here is a screenshot of the game in action, run in Blinkenlights: Here is the text shown in the console:

(LET ' S PLAY A NUMBER GUESSING GAME. I ' M THINKING OF A CERTAIN NUMBER BETWEEN
1 AND 10. SAY A NUMBER, AND I ' LL TELL YOU IF IT ' S LESS THAN, GREATER THAN,
OR EQUAL TO MY NUMBER. CAN YOU GUESS WHICH NUMBER I ' M THINKING OF?)
(PLEASE INPUT YOUR NUMBER IN UNARY. FOR EXAMPLE, 1 IS (*) , 3 IS (* * *) , ETC.)
NUMBER>(* * *)
(YOUR GUESS IS LESS THAN MY NUMBER.)
NUMBER>*
(PLEASE INPUT YOUR NUMBER IN UNARY. FOR EXAMPLE, 1 IS (*) , 3 IS (* * *) , ETC.)
NUMBER>(* * * * * * * *)
(YOUR GUESS IS GREATER THAN MY NUMBER.)
NUMBER>


We can see that the game is able to produce interactive outputs based on the feedback from the user, which is an essential feature for creating games. Note that there is also robust input handling in action, where in the second input NUMBER>*, the user writes an invalid input *, which is not a list. The game can handle such inputs without crashing.

The code is available at https://github.com/woodrush/sectorlisp-examples/blob/main/lisp/number-guessing-game.lisp.

### Extended Lisp REPL - Transforming the Language Itself

The I/O feature can be used to transform the SectorLISP language itself as well. As an example, I made an extended Lisp REPL where macro, define, progn, as well as print and read are all implemented as new special forms.

Here is an example session of the program:

REPL>(define defmacro (quote (macro (name vars body)
( (define (~ name) (quote (macro (~ vars) (~ body))))))))
=>(macro (name vars body) ( (define (~ name) (quote (macro (~ vars) (~ body))))
))

REPL>(defmacro repquote (x)
( (quote ((~ x) (~ x)))))
=>(macro (x) ( (quote ((~ x) (~ x)))))

REPL>(repquote (1 2 3))
=>((1 2 3) (1 2 3))

REPL>


The code is available at https://github.com/woodrush/sectorlisp-examples/blob/main/lisp/repl-macro-define.lisp.

In the example above, the user first uses the backquote macro  to define defmacro as a new macro, then uses defmacro to define a new macro repquote. These newly added features allow an interaction that is much more closer to those in modern Lisp dialects.

In the code, these additional user inputs are included at the end of the code which could be directly pasted in the console. However, we could look at this in another way - by writing the REPL code as the header, we have effectively transformed the syntax of the language itself, by introducing new special forms which were not present in the original interface. The DEFINE special form is also introduced in SectorLISP’s friendly branch, which adds some extra bytes. With READ and PRINT, we can instead build these new features on top of the interface as software, allowing us to save a lot of the program size.

### Interactive BASIC REPL

As a final example for drastically modifying the means of user interactions, I made an interactive BASIC interpreter written in the I/O SectorLISP. It runs a subset of BASIC with the instructions LET, IF, GOTO, PRINT, REM, and the infix operators +, -, %, and <=. Integers are expressed in unary as a list of atoms, such as (1 1 1).

Here is a screenshot of the final results, run in Blinkenlights: @jart has created a video of it running on Blinkenlights (Thank you @jart!):

The code is available at https://github.com/woodrush/sectorlisp-examples/blob/main/lisp/basic-repl.lisp.

In this example, SectorLISP no longer presents an interface for evaluating Lisp expressions, but provides a new interface for recording and evaluating BASIC programs, transforming SectorLISP into an entirely different application. This highlights how programming languages can be used to redesign computers into tools for arbitrary purposes - using this SectorLISP program, users can now interact with the computer in a new way using the BASIC language.

Although it is indeed possible to run this evaluator as a static program as in the code shown at the beginning, the new program is able to hide and encapsulate the details of the underlying Lisp program by presenting a new interface. For the static version, the evaluator must also be entirely retyped again to evaluate a new BASIC program, which is a major difference in terms of interaction. This shows how features as simple as READ and PRINT can be used to create a powerful application with the language. In a way, we can think that SectorLISP now works as a minimal operating system, and the programs within it such as this REPL works as an application that extends the capabilities of the underlying OS.

## Implementation Details

Let’s look at some details for dealing with I/O.

### Sequential Execution - Defining PROGN using Pre-existing Features

First of all, side effects are inseparable from the notion of sequential execution. Although lambda bodies in SectorLISP can only have one expression, there is in fact an already built-in way to naturally manage sequential execution - you can pass expressions as the arguments of lambdas to make them executed sequentially!

For example, the following program allows the execution of three consecutive PRINTs:

((LAMBDA () NIL)
(PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C)))


Here, each PRINT statement is taken as the arguments of an empty lambda expression (LAMBDA () NIL), which are all executed in the order of appearance. This is possible since EVLIS evaluates all of the arguments before calling PAIRLIS to bind the values to the variables, so all of the expressions get evaluated in order regardless to the number of arguments that the lambda expects.

Since this empty lambda can be used anywhere with an arbitrary number of expressions, you can name it PROGN and use it as follows:

((LAMBDA (PROGN)
(PROGN (PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C))))
(QUOTE (LAMBDA () NIL)))


Note that PROGN always returns NIL instead of the last expression inside the sequence, which is different from the behavior in most Lisp dialects. To extract the values from a PROGN sequence, you can create repeated lambda arguments as follows:

((LAMBDA (PROGN)
(PROGN (PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C))
(QUOTE D)))
(QUOTE (LAMBDA (X X X X) X)))
;; Returns (QUOTE D)


Note that the return value of PRINT is designed to be undefined to save the program space. This does not become a problem as will be discussed later.

You can use CONS instead of PROGN as well for the same purpose:

(CDR (CDR (CDR
(CONS (PRINT A)
(CONS (PRINT B)
(CONS (PRINT C)
(QUOTE D)))))))


These tools are enough to deal with sequential execution and the extraction of the executed expressions.

When I first came up with the PROGN solution, I thought it was as if SectorLISP had been awaiting for sequential execution to be used. Although pure expressions as in the original SectorLISP implementation do not require this feature, it was a nice realization that this feature had already been built in so naturally in SectorLISP. It is also pleasing that the syntax it provides is the same as modern Lisp dialects, only with the difference that it always returns NIL instead of the final value, which still can be worked around using the methods discussed earlier.

### Comments inside PROGN

Since all of the values inside PROGN are discarded after its execution, you can write comments inside a PROGN block, with the expense of some RAM space in the string interning region and some extra evaluations:

((LAMBDA (
PROGN ;;
)
(PROGN ;; (QUOTE - THIS PRINTS 3 CONSECTUTIVE LETTERS.)
(PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C))))
(QUOTE (LAMBDA () NIL))
NIL)


Here, the variable ;; is bound to NIL and is placed inside PROGN. Since ;; immediately evaluates to NIL and is discarded, this does nothing to the relevant states of the interpreter and the program. Because ;; actually does not comment out its following statement in SectorLISP, the comment body that follows after is enclosed inside a QUOTE form to prevent from it being executed, which allows for its result to also be discarded after execution.

Also, note that the parentheses for the outer lambda have some extra newlines to prevent text editors from commenting out the parentheses ). This format is used in the number guessing program as well.

### Loops using Recursion

Although this is not a newly added feature, it is worth noting that loops can be implemented as recursion, by calling a function within itself. In the number guessing game example, the functions MAIN and GAMELOOP are called within themselves to be executed an arbitrary number of times. This combined with PROGN provides a natural way for writing sequential programs.

The PRINT feature is not only convenient for the user of the program, but in fact provides a helpful interface for the programmer as well. That is, it allows for print debugging, to check the values occurring at runtime. Even with Lisp having a comfortable syntax, even the most experienced programmer would have a difficult time debugging a large program if the internal states and variables could not be observed at runtime.

This can be done by simply wrapping the expression with a predefined DEBUG function:

((LAMBDA (DEBUG)
...
(DEBUG EXPR)
...
)
(QUOTE (LAMBDA (X) (CDR (CONS (PRINT X) X)))))


The need for the extra wrapper function DEBUG occurs since the return value of PRINT is designed to be undefined to save the program size.

The art of writing a program always comes with the act of deleting and revising a program, by observing its behavior and the internal states. The print debugging feature is a simple yet powerful interface that is a de facto requirement if one wishes to write large programs. Such an interface is comparable to the reason why COND is implemented in SectorLISP instead of IF which usually induces a more obfuscated program structure. I myself heavily used this print debugging feature to write the BASIC interpreter, as well as the version that runs in the original SectorLISP which I wrote and debugged in the I/O SectorLISP fork.

### Return Values of PRINT

As it was mentioned earlier, PRINT is designed to return an undefined value to save the program size. Since values passed to PRINT can be extracted using DEBUG, and PRINT can be used in PROGN where values are discarded, having PRINT to return undefined values was not a problem for at least in all of the examples discussed before. Running a bare PRINT expression in the REPL also didn’t print any unwanted strings in the console, so I consider that this property can be safely managed in most use cases. Running various PRINT expressions in the REPL turns out like this:

(PRINT (QUOTE A))
A
(PRINT (PRINT (QUOTE A)))
ANIL
AAA


Notice that the results are slightly odd in the first expression, since the REPL is supposed to show the return value of PRINT as well as its effect of printing A in the console, but nothing is printed. In the second expression, a nested PRINT expression turns out to return NIL, which is printed after A as a return value by the REPL. This phenomenon should not occur in well-written large programs, if the program is written so that the return values of PRINT is not referenced by anything, which should be a natural result if they are all executed inside PROGN.

READ is much safer since it is by definition designed to have a valid return value regardless of its context. At first, there was a bug where the first character was ignored by READ, but it was fixed by caching the lookahead character from the user input inside GetChar, as fixed in 162969d (the latest version uses the %bp register instead of %fs, as fixed in 1af3db7).

## Assembly Optimizations

Here we’ll cover the details of optimizing the assembly size. More details for the methods used in the original SectorLISP assembly code are available at the original SectorLISP blog post, https://justine.lol/sectorlisp2/.

### Smaller Jump Instruction Encodings

This is a method used in the pull request by @jart, the author of the original SectorLISP. Conditional jumps in x86 are encoded in different instruction sizes depending on the size of the jump’s displacement. When the displacement fits in one byte, i.e. it is between -128 and 127, the instruction fits in two bytes, instead of four bytes when the displacement is larger than that size. The pull request by @jart uses this feature by first reordering the functions within the assembly code, allowing to shrink the displacements for the conditional jump instructions related to READ and PRINT.

### Reducing Return Instructions using the Control Flow Structure

This is a method used in the original SectorLISP implementation, which is used in the fork as well. Consider the following example where a function A calls another function B and then immediately returns afterward:

A:      mov %ax,%si
call B
ret

B:      mov %si,%bp
ret


the code can then be reduced by two instructions without changing the behavior, by concatenating A before B as the following:

A:      mov %ax,%si
;       slide
B:      mov %si,%bp
ret


This way, even if there is no ret instruction in the A block, the control flow can immediately move inside B where it has a ret instruction. The same ret instruction is therefore shared by two functions A and B. This allows function calls such as A and B to both behave the same as in the previous code with a fewer amount of instructions.

This method is used to implement READ and PRINT as an extension of .PutObject and GetToken where some additional instructions are run before the original functions. This way of reusing existing code allowed the increase of the program size to be a rather small size.

## Conclusion

I made a fork of SectorLISP that supports two new special forms READ and PRINT, which provides a natural I/O interface useful for both the programmer and the user of the program. This allowed for the following findings:

• The fork allows for writing interactive programs for SectorLISP, such as games and REPLs of other programming languages such as a subset of BASIC.
• With the new special forms READ and PRINT, you can now design the interactions between the user and the computer, which is a feature supported in all of the other languages mentioned in the original SectorLISP blog post, including SectorFORTH, BootBasic, and also BF.
• Adding all of these features only amounted to an extra 35 bytes of the binary, with a total of 469 bytes, or 471 bytes including the boot signature. When speaking of the binary footprint of a program, it is important for each program to share as many common features as possible. The fork of SectorLISP achieves this by supporting the I/O feature, and also accomplishes in showing that the program size can still be limited to an amount less by 22 bytes or more compared with the other programming languages mentioned in the SectorLISP blog post.
• Update: As mentioned earlier, a pull request from @jart has allowed us to bring down the total program size to 465 bytes or 467 bytes including the boot signature. Thank you @jart for your contribution!
• Update (2022/4/6): The fork was merged into the original SectorLISP repo. Thanks for reviewing and merging it!

## Credits

The video for the interactive BASIC REPL was created by Justine Tunney. The new I/O fork of SectorLISP discussed in this post was first created by Hikaru Ikuta, and have received contributions from Justine Tunney. The SectorLISP project was first started by Justine Tunney and was created by the authors who have contributed to the project, and the authors credited in the original SectorLISP blog post. I’d also like to thank Justine and Hannah Miller from the Rochester Institute of Technology for the fruitful discussion on improving this blog post.

]]>