LambdaLisp is a Lisp interpreter written as an untyped lambda calculus term. The input and output text is encoded into closed lambda terms using the Mogensen-Scott encoding, so the entire computation process solely consists of the beta-reduction of lambda calculus terms.
When run on a lambda calculus interpreter that runs on the terminal, it presents a REPL where you can interactively define and evaluate Lisp expressions. Supported interpreters are:
Supported features are:
let
set-macro-character
malloc
, memread
, and memwrite
and much more.
LambdaLisp can be used to write interactive programs. Execution speed is fast as well, and the number guessing game runs on the terminal with an almost instantaneous response speed.
Here is a PDF showing its entire lambda term, which is 42 pages long:
The embedded PDF may not show on mobile. The same PDF is also available at https://woodrush.github.io/lambdalisp.pdf.
Page 32 is particulary interesting, consisting entirely of (
s:
LambdaLisp is a Lisp interpreter written as a closed untyped lambda calculus term. It is written as a lambda calculus term which takes a string as an input and returns a string as an output. The input is the Lisp program and the user’s standard input, and the output is the standard output. Characters are encoded into lambda term representations of natural numbers using the Church encoding, and strings are encoded as a list of characters with lists expressed as lambdas in the Mogensen-Scott encoding, so the entire computation process solely consists of the beta-reduction of lambda terms, without introducing any non-lambda-type object. In this sense, LambdaLisp operates on a “truly purely functional language” without any primitive data types except for lambda terms.
Supported features are closures and persistent bindings with let
, reader macros, 32-bit signed integers, a built-in object-oriented programming feature based on closures, and much more.
LambdaLisp is tested by running code on both Common Lisp and LambdaLisp and comparing their outputs.
The largest LambdaLisp-Common-Lisp polyglot program that has been tested is lambdacraft.cl,
which runs the Lisp-to-lambda-calculus compiler LambdaCraft I wrote for this project, also used to compile LambdaLisp itself.
When run on a lambda calculus interpreter that runs on the terminal, LambdaLisp presents a REPL where you can interactively define and evaluate Lisp expressions. These interpreters automatically process the string-to-lambda encoding for handling I/O through the terminal.
Lisp has been described by Alan Kay as the Maxwell’s equations of software. In the same sense, I believe that lambda calculus is the particle physics of computation. LambdaLisp may therefore be a gigantic electromagnetic Lagrangian that connects the realm of human-friendly programming to the origins of the notion of computation itself.
LambdaLisp is available on GitHub at https://github.com/woodrush/lambdalisp. Here we will explain how to try LambdaLisp right away on x86-64-Linux and other platforms such as Mac.
You can try the LambdaLisp REPL on x86-64-Linux by simply running:
git clone https://github.com/woodrush/lambdalisp
cd lambdalisp
make run-repl
The requirement is cc
which should be installed by default.
To try it on a Mac, please see the next section.
This will run LambdaLisp on SectorLambda, the 521-byte lambda calculus interpreter. The source code being run is lambdalisp.blc, which is the lambda calculus term shown in lambdalisp.pdf written in binary lambda calculus notation.
SectorLambda automatically takes care of the string-to-lambda I/O encoding to run LambdaLisp on the terminal. Interaction is done by writing LambdaLisp in continuation-passing style, allowing a Haskell-style interactive I/O to work on lambda calculus interpreters.
When building SectorLambda, Make runs the following commands to get its source codes:
Blc.S
: wget https://justine.lol/lambda/Blc.S?v=2
flat.lds
: wget https://justine.lol/lambda/flat.lds
After running make run-repl
, the REPL can also be run as:
( cat ./bin/lambdalisp.blc | ./bin/asc2bin; cat ) | ./bin/Blc
SectorLambda is x86-64-Linux exclusive. On other platforms such as a Mac, the following command can be used:
git clone https://github.com/woodrush/lambdalisp
cd lambdalisp
make run-repl-ulamb
This runs LambdaLisp on the lambda calculus interpreter clamb
.
The requirement for this is gcc
or cc
.
After running make run-repl-ulamb
, the REPL can also be run as:
( cat ./bin/lambdalisp.ulamb | ./bin/asc2bin; cat ) | ./bin/clamb -u
LambdaLisp supports other various lambda calculus interpreters as well. For instructions for other interpreters, please see the GitHub repo.
Once make run-repl
is run, you can play the number guessing game with:
( cat ./bin/lambdalisp.blc | ./bin/asc2bin; cat ./examples/number-guessing-game.cl; cat ) | ./bin/Blc
If you ran make run-repl-ulamb
, you can run:
( cat ./bin/lambdalisp.ulamb | ./bin/asc2bin; cat ./examples/number-guessing-game.cl; cat ) | ./bin/clamb -u
You can run the same script on Common Lisp. If you use SBCL, you can run it with:
sbcl --script ./examples/number-guessing-game.cl
The following LambdaLisp code runs right out of the box:
(defun new-counter (init)
;; Return a closure.
;; Use the let over lambda technique for creating independent and persistent variables.
(let ((i init))
(lambda () (setq i (+ 1 i)))))
;; Instantiate counters
(setq counter1 (new-counter 0))
(setq counter2 (new-counter 10))
(print (counter1)) ;; => 1
(print (counter1)) ;; => 2
(print (counter2)) ;; => 11
(print (counter1)) ;; => 3
(print (counter2)) ;; => 12
(print (counter1)) ;; => 4
(print (counter1)) ;; => 5
An equivalent JavaScript code is:
// Runs on the browser's console
function new_counter (init) {
let i = init;
return function () {
return ++i;
}
}
var counter1 = new_counter(0);
var counter2 = new_counter(10);
console.log(counter1()); // => 1
console.log(counter1()); // => 2
console.log(counter2()); // => 11
console.log(counter1()); // => 3
console.log(counter2()); // => 12
console.log(counter1()); // => 4
console.log(counter1()); // => 5
As described in Let Over Lambda, when you have closures, you get object-oriented programming for free. LambdaLisp has a built-in OOP feature implemented as predefined macros based on closures. It supports Python-like classes with class inheritance:
;; Runs on LambdaLisp
(defclass Counter ()
(i 0)
(defmethod inc ()
(setf (. self i) (+ 1 (. self i))))
(defmethod dec ()
(setf (. self i) (- (. self i) 1))))
(defclass Counter-add (Counter)
(defmethod *init (i)
(setf (. self i) i))
(defmethod add (n)
(setf (. self i) (+ (. self i) n))))
(defparameter counter1 (new Counter))
(defparameter counter2 (new Counter-add 100))
((. counter1 inc))
((. counter2 add) 100)
(setf (. counter1 i) 5)
(setf (. counter2 i) 500)
An equivalent Python code is:
class Counter ():
i = 0
def inc (self):
self.i += 1
return self.i
def dec (self):
self.i -= 1
return self.i
class Counter_add (Counter):
def __init__ (self, i):
self.i = i
def add (self, n):
self.i += n
return self.i
counter1 = Counter()
counter2 = Counter_add(100)
counter1.inc()
counter2.add(100)
counter1.i = 5
counter2.i = 500
More examples can be found in the GitHub repo. The largest LambdaLisp program currently written is lambdacraft.cl, which runs the lambda calculus compiler LambdaCraft I wrote for this project, also used to compile LambdaLisp itself.
Key features are:
let
set-macro-character
malloc
, memread
, and memwrite
Supported special forms and functions are:
~a
and ~%
), write-to-string, intern, stringp`
,
,@
'
#\
.
, field assignment by setfThere are 2 types of tests written for LambdaLisp. The GitHub Actions CI runs these tests.
The files examples/*.cl
run both on Common Lisp and LambdaLisp producing identical results, except for the initial >
printed by the REPL in LambdaLisp. This test first runs *.cl
on both SBCL and LambdaLisp and compares their outputs.
The files examples/*.lisp
are LambdaLisp-exclusive programs. The output of these files are compared with test/*.lisp.out
.
examples/lambdacraft.cl
runs LambdaCraft, a Common-Lisp-to-lambda-calculus compiler written in Common Lisp,
used to compile the lambda calculus source for LambdaLisp.
The script defines a binary lambda calculus (BLC) program that prints the letter A
and exits,
and prints the BLC source code for the defined program.
The LambdaCraft compiler hosting test first executes examples/lambdacraft.cl
on LambdaLisp, then runs the output BLC program on a BLC interpreter, and checks if it prints the letter A
and exits.
This test is currently theoretical since it requires a lot of time and memory, and is unused in make test-all
.
This test extends the previous LambdaCraft compiler hosting test and checks if the Common Lisp source code for LambdaLisp runs on LambdaLisp itself. Since the LambdaCraft compiler hosting test runs properly, this test should theoretically run as well, although it requires a tremendous amount of memory and time. The test is run on the binary lambda calculus interpreter Blc.
One concern is whether the 32-bit heap address space used internally in LambdaLisp is enough to compile this program. This can be solved by compiling LambdaLisp with an address space of 64-bit or larger, which can be done simply by replacing the literal 32
(which only appears once in src/lambdalisp.cl
) with 64
, etc.
Another concern is whether if the execution hits Blc’s maximum term limit. This can be solved by compiling Blc with a larger memory limit, by editing the rule for $(BLC)
in the Makefile.
Here we will explain LambdaLisp’s implementation details. Below is a table of contents:
Before introducing LambdaLisp-specific implementation details, we’ll cover some general topics about programming in lambda calculus.
LambdaLisp is written as a function which takes one string as an input and returns one string as an output. The input represents the Lisp program and the user’s standard input (the is the input string), and the output represents the standard output. A string is represented as a list of bits of its ASCII representation. In untyped lambda calculus, a method called the Mogensen-Scott encoding can be used to express a list of lambda terms as a pure untyped lambda calculus term, without the help of introducing a non-lambda-type object.
Bits are encoded as:
Lists are made using the list constructor and terminator and :
Note that .
Under these rules, the bit sequence 0101
can be expressed as a composition of lambda terms:
which is exactly the same as how lists are constructed in Lisp. This beta-reduces to:
Using this method, both the standard input and output strings can entirely be encoded into pure lambda calculus terms, letting LambdaLisp to operate with beta reduction of lambda terms as its sole rule of computation, without the requirement of introducing any non-lambda-type object.
The LambdaLisp execution flow is thus as follows: you first encode the input string (Lisp program and stdin) as lambda terms, apply it to , beta-reduce it until it is in beta normal form, and parse the output lambda term as a Mogensen-Scott-encoded list of bits (inspecting the equivalence of lambda terms is quite simple in this case since it is in beta normal form). This rather complex flow is supported exactly as is in 3 lambda-calculus-based programming languages: Binary Lambda Calculus, Universal Lambda, and Lazy K.
Binary Lambda Calculus (BLC) and Universal Lambda (UL) are programming languages with the exact same I/O strategy described above -
a program is expressed as one pure lambda term that takes a Church-Mogensen-Scott-encoded string and returns a Church-Mogensen-Scott-encoded string.
When the interpreters for these languages Blc
and clamb
are run on the terminal,
the interpreter automatically encodes the input bytestream to lambda terms, performs beta-reduction,
parses the output lambda term as a list of bits, and prints the output as a string in the terminal.
The differences in BLC and UL are in a slight difference in the method for encoding the I/O. Otherwise, both of these languages follow the same principle, where lambda terms are the solely available object types in the language.
In BLC and UL, lambda terms are written in a notation called binary lambda calculus. Details on the BLC notation is described in the Appendix.
Lazy K is a language with the same I/O strategy with BLC and UL except programs are written as SKI combinator calculus terms instead of lambda terms. The SKI combinator calculus is a system equivalent to lambda calculus, where there are only 3 functions:
Every SKI combinator calculus term is written as a combination of these 3 functions.
Every SKI term can be easily be converted to an equivalent lambda calculus term by simply rewriting the term with these rules. Very interestingly, there is a method for converting the other way around - there is a consistent method to convert an arbitrary lambda term with an arbitrary number of variables to an equivalent SKI combinator calculus term. This equivalence relation with lambda calculus proves that SKI combinator calculus is Turing-complete.
Apart from the strong condition that only 3 predefined functions exist,
the beta-reduction rules for the SKI combinator calculus are exactly identical as that of lambda calculus,
so the computation flow and the I/O strategy for Lazy K is the same as BLC and Universal Lambda -
all programs can be written purely in SKI combinator calculus terms without the need of introducing any function other than S
, K
, and I
.
This allows Lazy K’s syntax to be astonishingly simple, where only 4 keywords exist - s
, k
, i
, and `
for function application.
As mentioned in the original Lazy K design proposal, if BF captures the distilled essence of structured imperative programming, Lazy K captures the distilled essence of functional programming. It might as well be the assembly language of lazily evaluated functional programming. With the simple syntax and rules orchestrating a Turing-complete language, I find Lazy K to be a very beautiful language being one of my all-time favorites.
LambdaLisp is written in these 3 languages - Binary Lambda Calculus, Universal Lambda, and Lazy K. In each of these languages, LambdaLisp is expressed as one lambda term or SKI combinator calculus term. Therefore, to run LambdaLisp, an interpreter for one of these languages is required. To put in a rather convoluted way, LambdaLisp is a Lisp interpreter that runs on another language’s interpreter.
The I/O model described previously looks static - at first sight it seems as if the entire value of needs to be pre-supplied beforehand on execution, making interactive programs be impossible. However, this is not the case. The interpreter handles interactive I/O by the following clever combination of input buffering and lazy evaluation:
The output string is printed as soon as the interpreter deduces the first characters of the output string.
As an example, consider the BLC program which initially prints a prompt In>
,
accepts standard input, then outputs the ROT13 encoding of the standard input.
After the user starts the program, the interpreter’s beta-reduction proceeds as follows:
Here, is a lambda term representing the character I
.
As is beta-reduced, it “weaves out” its output string In>
on the way.
This is somewhat akin to the central dogma of molecular biology,
where ribosomes transcribe genetic information to polypeptide chains - the program is the gene,
the interpreter is the ribosome, and the list of output characters is the polypeptide chain.
The interpreter continues its evaluation until further beta reduction is impossible without the knowledge of the value of ,
which happens at . At this point, the string In>
is shown on the terminal since its values are already determined and available.
Seeing the prompt In>
, suppose that the user types the string “Hi”. The interpreter then buffers its lambda-encoded expression into the pointer that points to ,
making the evaluation proceed as:
On the terminal, the interaction process would look like this:
In>[**Halts; User types string**]Hi[**User types return; `Hi` is buffered into stdin**]
Uv
We will now cover LambdaLisp-specific implementation details.
When viewed as a programming language, lambda calculus is a purely functional language. This derives the following two basic programming strategies for LambdaLisp:
Writing in continuation-passing style also helps the lambda calculus interpreter to prevent re-evaluating the same term multiple times. This is very, very important and critical when writing programs for the 521-byte binary lambda calculus interpreters Blc, tromp and uni, since it seems that they lack a memoization feature, although they readily have a garbage collection feature. Writing in direct style gradually slows down the runtime execution speed since memoization does not occur and the same computation is repeated multiple times. However, by carefully writing the entire program in continuation-passing style, the evaluation flow can be controlled so that the program only proceeds when the value under attention is in beta-normal form. In this situation, since values are already evaluated to their final form, the need for memoization becomes unnecessary in the first place.
The continuation-passing style technique suddenly transforms Blc to a very, very fast and efficient lambda calculus interpreter.
In fact, the largest program lambdacraft.cl runs the fastest and even the most memory-efficient on Blc,
using only about 1.7 GB of memory, while clamb
uses about 5GB of memory.
I suspect that the speed is due to a efficient memory configuration and the memory efficiency is due to the garbage collection feature.
I was very surprised how large of a galaxy is fit into the microcosmos of 521 bytes!
This realization that continuation-passing-style programs run very fast on Blc was what made everything possible and what motivated me to set on the journey of building LambdaCraft and LambdaLisp.
The difference between continuation-passing style and direct style is explained in the Appendix, with JavaScript code samples that run on the browser’s console.
Right after LambdaLisp starts running, it runs the following steps:
quote
.>
is printed.The prelude Lisp code is a LambdaLisp code
that defines core functions such as defun
and defmacro
as macros.
The prelude is hard-coded as a compressed string, which is decompressed by the string generator before being passed to the interpreter.
The compression method is described later.
The LambdaLisp core code written in LambdaCraft hard-codes only the essential and speed-critical functions.
Other non-critical functions are defined through LambdaLisp code in the prelude.
A lot of the features including defun
and defmacro
which seem to look like keywords are in fact defined as macros.
Due to this implementation, it is in fact possible to override the definition of defun
to something else using setq
, but I consider that as a flexible feature.
There is in fact a subtle form of asynchronous programming in action in this startup phase.
Since the prelude code takes a little bit of time to evaluate since it contains a lot of code,
if the initial prompt carret >
is shown after evaluating the prelude code,
it causes a slightly stressful short lag until the prelude finishes running.
To circumvent this, the the initial prompt carret >
is printed before the prelude is loaded.
This allows the prelude code to be evaluated while the user is typing their initial Lisp code.
Since the input is buffered in all of the lambda calculus interpreters, their input does not get lost even when the prelude is running in the background.
If the user types very fast, there will become a little bit of waiting until the result of the inital input is shown,
but if the prelude is already loaded, their result will be shown right away.
All later inputs are evaluated faster because the prelude is only read once at initialization.
In a way, this is a primitive form of asynchronous programming, where processing the user input and the execution of some code is done concurrently.
After the prelude is loaded, the interpreter enters its basic read-eval-print loop.
As mentioned before, the basic design is that all state-affecting functions must pass the state as arguments,
and basicallly all functions are written in CPS.
This makes the core eval
function have the following signature:
;; LambdaCraft
(defrec-lazy eval (expr state cont)
...)
Where expr
is a Lisp expression, state
is the global state, and cont
is the continuation.
The comment ;; LambdaCraft
indicates that this is the source code for the LambdaLisp interpreter written in LambdaCraft.
The direct return type of eval
is string
, and not expr
or state
.
This is because the entire program is expected to be a program that “takes a string and outputs a string”.
This design also allows print debugging to be written very intuitively in an imperative style as discussed later.
Instead of using the direct return values, the evaluation results are “returned” to later processes through the continuation cont
.
cont
is actually just a callback function that is called at the end of eval
,
which is called with the evaluated expr
and the new state
as its arguments.
For example, if the evaluation result is 42
and the renewed state is state
, eval
calls the callback function cont
as
(cont result state) ;; Where result == 42 and state is the renewed state
in the final step of the code. Here, two values result
and state
are “returned” to the callback function cont
and are used later in cont
.
Having the direct return type of eval
to be a string makes the implementation of exit
very simple. It is implemented in eval
as:
;; LambdaCraft
((stringeq (valueof head) kExit)
nil)
Here is how it works:
eval
no longer calls a continuation when exit
is called,
no further computation happens and the interpreter stops there.eval
’s direct return value is set to nil
, which is a string terminator.
This leaves nil
at the end of the output string, completing the value of the output string.A similar implementation is used for read-expr
, which exits the interpreter when there is no more text to read:
;; LambdaCraft
;; Exit the program when EOF is reached
(if (isnil stdin)
nil)
The state
is a 3-tuple that contains reg
, heap
and stdin
. In lambda terms:
reg
is used to store values of global variables used internally by the interpreter.
heap
is a virtual heap memory used to store the let
and lambda
bindings caused by the code.
stdin
is the pointer to the standard input provided by the interpreter.
Similar with cons
, state
is a function that accepts a callback function and applies it with the values it’s storing.
Therefore, the contents of state
can be extracted by passing a callback function to state
:
;; LambdaCraft
(state
(lambda (reg heap stdin)
[do something with reg/heap/stdin]))
which is continuation-passing style code, since we are using callback functions that accept values and describes what to do with those values.
expr
is a Lisp expression. Expressions in LambdaLisp belong to one of the following 5 types: atom, cons, lambda, integer, and string.
All expressions are a 2-tuple with a type tag and its value:
The structure of depends on the type. For all types, is a selector for a 5-tuple:
This way, we can do type matching by writing:
(typetag
[code for atom case]
[code for cons case]
[code for lambda case]
[code for integer case]
[code for string case])
Type matching ensures that the for each type is always processed correctly according to its type. Since each tag is a selector of a 5-tuple, the tag will select the code that will be executed next. Since all codes are lazily evaluated, the codes for the unselected cases will not be evaluated.
As in the case of state
, the type and value can be extracted by passing a callback to expr
:
;; LambdaCraft
(expr
(lambda (dtype value)
(dtype
[do something with `value`])))
The virtual heap memory (RAM) is expressed as a binary tree constructed by the tuple constructor cons
.
The idea of using a binary tree data structure for expressing a RAM unit is borrowed from irori’s Unlambda VM (in Japanese).
The implementation of the binary tree is modified from this definition so that the tree could be initialized as nil
.
One RAM cell can store a value of any size and any type - Lisp terms, strings, or even the interpreter’s current continuation. This is because the RAM can actually only store one type, lambdas, but everything in lambda calculus belongs to that one type.
Trees are constructed using the same constructor as lists.
A list containing A B C
can be written using as:
Using the same mechanism, the following binary tree ,
can be expressed using cells as follows:
Every node where all of its leaves have unwritten values have their branch trimmed and set to . If all values with the address is unwritten, the tree would look like this:
When the tree search function encounters , it returns the integer zero (a list of consecutive s). The tree grows past only when a value is written past that address. The initial state of the RAM is , which effectively initializes the entire -bit memory space to zero without creating nodes. Afterwards, the RAM unit only grows approximately linearly as values are written instead of growing exponentially with the machine’s bit size.
LambdaLisp uses a 32-bit address space for the RAM,
which is specified here.
The address space can be modified to an arbitrary integer by replacing the literal 32
which shows up only once in the source code with another Church-encoded numeral.
LambdaLisp exposes malloc
and memory reading/writing to the user through the special forms malloc
, memread
and memwrite
. malloc
returns an integer indicating the pointer to the heap, which is initialized with nil
.
The pointer can be used with memread
and memwrite
to read and store LambdaLisp objects inside the interpreter’s heap cell.
This can be used to implement C-style arrays, as demonstrated in malloc.lisp.
The register object reg
uses the same binary tree data structure as the RAM,
except reg
uses variable-length addresses,
while heap
uses a fixed-length address.
The variable-length address makes the address of each cell shorter, speeding up the read and write processes.
reg
is used to store global variables that are frequently read out and written by the interpreter.
For example, we can let the register tree have the following structure:
The addresses for becomes , , , and , respectively.
LambdaLisp uses 7 registers which are defined here.
As mentioned before, the prelude is hard-coded as text that is embedded into the LambdaLisp source. When embedding it as a lambda, the text is compressed into an efficient lambda term, optimized for the binary lambda calculus notation.
The prelude is generated by consecutively applying lots and lots of characters to a function called string-concatenator
:
;; LambdaCraft
(def-lazy prelude
(let
(("a" ...)
("b" ...)
("c" ...)
...)
(string-concatenator ... "n" "u" "f" "e" "d" "(" nil)))
(defrec-lazy string-concatenator (curstr x)
(cond
((isnil x)
curstr)
(t
(string-concatenator (cons x curstr)))))
The string concatenator is something close to a generator object in Python that:
curstr
, and returns string-concatenator
itself (with a renewed curstr
).nil
, it returns the stocked curstr
instead of returning itself.To obtain the string (aaa)
, you can use string-concatenator
as:
(string-concatenator ")" "a" "a" "a" "(" nil)
The characters are written in reverse order, since string-concatenator
uses a stack to create strings.
This helps a lot for compressing strings in binary lambda calculus notation.
The let
in the above code is a macro that expands to:
;; LambdaCraft
(def-lazy prelude
((lambda ("(")
((lambda (")")
((lambda ("a")
(string-concatenator ")" "a" "a" "a" "(" nil))
[definition of "a"]))
[definition of ")"]))
[definition of "("]))
In binary lambda calculus, the innermost expression encodes to
(string-concatenator ")" "a" "a" "a" "(" nil)
= apply apply apply apply apply string-concatenator ")" "a" "a" "a" "("
= 01 01 01 01 01 [string-concatenator] 110 10 10 10 1110
Notice that the letter “a” is encoded as 01 ... 10
, which effectively only takes 4 bits.
Similarly, “)” takes 5 and “)” takes 6 bits.
Since apply
doesn’t increase the De Bruijn indices no matter how many times it appears,
every occurrence of the same character can be encoded into the same number of bits.
Therefore, by encoding each character in the order of of appearance, its BLC notation can be optimized to a short bit notation.
This part of the string generator can be seen as an interesting pattern in page 33 in lambdalisp.pdf:
Here you can see lots of variables being applied to some lambda term shown on the top line,
which actually represents string-concatenator
.
This consecutive application causes a lot of consecutive (
s.
This makes page 32 entirely consist of (
:
let
bindings are stored in the heap
in the state
and passed on as persistent bindings.
Each let
binding holds their own environment stack, and each environment stack points to its lexical parent environment stack.
For example, the following LambdaLisp code:
;; LambdaLisp
(let ((x 10) (y 20))
(setq x 5)
(let ((f (lambda ...)) (a '(a b c)))
(print x)
...)
(let ((g (lambda ...)) (b '(a b c))
(print b)
...)))
induces the following memory tree:
The node containing name = hello
is not relevant now.
The root node at the bottom contains bindings to basic functions initialized when running the prelude.
This memory tree is expressed in the heap as:
Here, the virtual RAM address space is shown as 16 bits for simplicity (the actual address space is 32 bits).
When a let
binding occurs, the newest unallocated heap address is allocated my the malloc
function,
and the interpreter’s “current environment pointer” global variable contained in reg
is rewritten to the newly allocated address.
This also happens when a lambda is called, creating an environment binding the lambda arguments.
Each stack is initialized with a redirection pointer that points to its parent environment, shown at the bottom of the stack in the second figure.
The bindings for each variable name is then pushed on top of the stack.
On variable lookup, the lookup function first looks at the current environment’s topmost binding.
If the target variable is not contained in the stack, the lookup reaches the redirection pointer at the bottom of the stack,
where it will run the lookup again for the environment pointed by the redirection pointer.
The lookup process ends when the redirection pointer is nil
, where it concludes that the target variable is not bound to any value in the target environment.
The use of the redirection pointer effectively models the environments’ tree structure.
For example, the x
in the (print x)
in the example code will invoke a lookup for x
.
The lookup function first looks at the binding stack at address 0x0002
and searches the stack until it reaches the redirection pointer to 0x0001.
The lookup function then searches the binding stack at 0x0001, where it finds the newest binding of x
which is x == 5
.
Since new bindings are pushed onto the stack, they get found before old bindings (here, x = 10
) are reached.
Variable rewriting with setq
is done by pushing assignments onto the stack.
The (setq x 5)
first searches for the variable x
, finds it at environment 0x0001, then pushes x = 5
on top of y = 20 :: x = 10
on the environment 0x0001.
The address 0x0000 represents the base environment.
It first starts out by being initialized by the function and macro definitions in the prelude.
Variables stored in the base environment behave as global variables.
This is where setq
and defglobal
behave differently.
When (setq x ...)
is called, setq
first searches for x
from the environment tree. If it finds x
somewhere in the tree,
it rewrites the found x
at that environment.
If it doesn’t find x
, it defaults to pushing a new variable x
to the current surrounding environment.
This way, setq
will only affect known variables or local variables.
On the other hand, defglobal
pushes the bindings to address 0x0000 no matter where it is called.
This way, the address 0x0000 can behave as a global environment.
The macro defun
is defined using defglobal
so that it always writes to the global environment.
The macro defun-local
is defined using setq
so that it writes in the local environment, allowing for Python and JavaScript-style local function definitions:
;; LambdaLisp
(defun f (x y)
(defun-local helper (z)
(* z z))
(print helper) ;; => @lambda
(+ (helper x) (helper y)))
(print helper) ;; => (), since it is invisible from the global environment
Although there is no garbage collection for let
and setq
bindings, there is a minimal form of GC for macro expansion.
During an evaluation of a macro, there occurs 2 Lisp form evaluations:
one for constructing the expanded form, and another for evaluating the expanded form in the called environment.
Once the macro has been expanded, we know for sure that the bindings it has used will not be used again.
Therefore, macro bindings are allocated to the stack region which has negative addresses starting from 0xFFFF,
shown on the left half of the memory tree diagram and the heap diagram.
The stack region is freed once the macro expansion finishes.
This mechanism also supports nested macros.
In the memory tree in the first figure in the previous section, the bindings for macros look the same as the bindings for lambdas, since at evaluation time they are treated the same. In the second figure, the environments for the macro bindings are shown on the left side of the RAM, since they are allocated in the stack region.
Although the same garbage-collection feature could be implemented for lambdas,
it causes problems for lambdas that return closures.
If lambda bindings are created in the stack region,
the environment of the returned closure will point to the stack region, which will be freed right after the closure is returned.
This causes a problem when the returned closure refers to a variable defined in the global environment (for example, to basic macros defined in the prelude such as cond
),
since the lookup function will eventually start looking at the stack region,
which could be occupied by an old environment or some other lambda’s environment, causing the risk for bugs.
This could be circumvented by carefully writing the code to not shadow global variables in let
bindings,
but that would severely restrict the coding style, so I chose to allocate new bindings for each lambda call.
The growing memory can be freed by implementing a mark-and-sweep garbage collection function, which is currently not supported by LambdaLisp.
Macros are implemented as first-class objects in LambdaLisp.
Both macros and lambdas are subtypes of the type lambda
, each annotated with a subtype tag.
Macros are treated the same as lambdas, except for the following differences:
0x00000000
set as its parent environment
so that expansion always happens in the global environment, as explained in the previous section.
0xFFFFFFFF
).
0x00000001
).An anonymous macro can be made with the macro
keyword as (macro (...) ...)
, in the exact same syntax as lambdas.
In the prelude, defmacro
is defined as the following macro
:
;; LambdaLisp
(defglobal defmacro (macro (name e &rest b)
`(defglobal ,name (macro ,e (block ,name ,@b)))))
In Let Over Lambda, it is mentioned that object-oriented programming can be implemented by using closures (Chapter 2). A primitive example is the counter example we’ve seen at the beginning:
(defun new-counter (init)
;; Return a closure.
;; Use the let over lambda technique for creating independent and persistent variables.
(let ((i init))
(lambda () (setq i (+ 1 i)))))
;; Instantiate counters
(setq counter1 (new-counter 0))
(setq counter2 (new-counter 10))
(print (counter1)) ;; => 1
(print (counter1)) ;; => 2
(print (counter2)) ;; => 11
(print (counter1)) ;; => 3
(print (counter2)) ;; => 12
(print (counter1)) ;; => 4
(print (counter1)) ;; => 5
LambdaLisp extends this concept and implements OOP as a predefined macro in the prelude. LambdaLisp supports the following Python-like object system with class inheritance:
(defclass Counter ()
(i 0)
(defmethod inc ()
(setf (. self i) (+ 1 (. self i))))
(defmethod dec ()
(setf (. self i) (- (. self i) 1))))
(defclass Counter-add (Counter)
(defmethod *init (i)
(setf (. self i) i))
(defmethod add (n)
(setf (. self i) (+ (. self i) n))))
(defclass Counter-addsub (Counter-add)
(defmethod *init (c)
((. (. self super) *init) c))
(defmethod sub (n)
(setf (. self i) (- (. self i) n))))
(defparameter counter1 (new Counter))
(defparameter counter2 (new Counter-add 100))
(defparameter counter3 (new Counter-addsub 10000))
((. counter1 inc))
((. counter2 add) 100)
((. counter3 sub) 10000)
(setf (. counter1 i) 5)
(setf (. counter2 i) 500)
(setf (. counter3 i) 50000)
The notion of blocks in LambdaLisp is a feature borrowed from Common Lisp, close to for
and break
in C and Java.
A block
creates a code block that can be escaped by running (return [value])
or (return-from [name] [value])
.
For example:
;; LambdaLisp
(block block-a
(if some-condition
(return-from block-a some-value))
...
some-other-value)
Here, when some-condition
is true, the return-from
lets the control immediately break from block-a
,
setting the value of the block
to some-value
.
If some-condition
is false, the program proceeds until the end, and the value of the block
becomes some-other-value
,
which is the same behavior as progn
. Nested blocks are also possible, as shown in examples/block.cl.
defun
is defined to wrap its content with an implicit block, you can write return-from
statements with the function name:
;; LambdaLisp
(defun f (x)
(if some-condition
(return-from f some-value))
(if some-condition2
(return-from f some-value2)
...))
Here is the definition of defun
in the prelude:
;; LambdaLisp
(defmacro defun (name e &rest b)
`(defglobal ,name (lambda ,e (block ,name ,@b))))
In order to implement blocks, the interpreter keeps track of the name and the returning point of each block.
This is done by preparing a global variable reg-block-cont
in the register,
used as a stack to push and pop name-returning-point pairs.
Since LambdaCraft is written in continuation-passing style, the returning point is explicitly given as the callback function cont
at any time in the eval
function.
Using this feature, when a block
form appears,
the interpreter first pushes the name and the current cont
to the reg-block-cont
global variable.
The pushed cont
is a continuation that expects the return value of the block
to be applied as its argument.
Whenever a (return-from [name] [value])
form is called, the interpreter searches the reg-block-cont
stack for the specified [name]
.
Since the searched cont
expects the return value of the block
to be applied,
the block escape control flow can be realized by applying [value]
to cont
, after popping the reg-block-cont
stack.
A loop
is a special form equivalent to while (true)
in languages such as C, used to create infinite loops.
A loop
creates an implicit block with the name ()
, and can be exited by running (return)
inside:
;; LambdaLisp
(defparameter i 0)
(loop
(if (= i 10)
(return))
(print i)
(setq i (+ i 1)))
Loops can also be exited by surrounding it with a block
:
;; LambdaLisp
(defparameter i 0)
(block loop-block
(loop
(if (= i 10)
(return-from loop-block))
(print i)
(setq i (+ i 1))))
Loops are implemented by passing a self-referenced continuation that runs the contents of loop
again.
Some examples of situations when errors are invoked in LambdaLisp are:
read-expr
:
)
is seeneval
:
eval-apply
(used when lambdas or macros are called):
lambda
When an error occurs during eval
, read-expr
or any function, the interpreter does the following:
Immediately stopping the current task and returning to the REPL is implemented very simply thanks to continuation-passing style.
When an error is invoked, instead of calling the continuation (the callback), the repl
function is called - this simple implementation allows error invocation.
Since invoking an error calls repl
,
and repl
calls read-expr
, eval
and eval-apply
,
this makes the four functions
read-expr
, eval
, eval-apply
and repl
be mutually recursive functions.
How mutually recursive functions are implemented in LambdaLisp is described later in the next section.
The function call stack trace is printed by managing a call stack in one of the interpreter’s global variables. Every time a lambda or a macro is called, the interpreter pushes the expression that invoked the function call to the call stack. When the function call exits properly, the call stack is popped. When an error is invoked during a function call, the interpreter prints the contents of the call stack.
Below are some other general lambda calclus programming techniques used in LambdaLisp.
In LambdaCraft, recursive functions can be defined using defrec-lazy
as follows:
;; LambdaCraft
(defrec-lazy fact (n)
(if (<= n 0)
1
(* n (fact (- n 1)))))
defrec-lazy
uses the Y combinator to implement anonymous recursion,
a technique used to write self-referencing functions under the absence of a named function feature.
Since LambdaLisp is based on macro expansion, when a self-referencing function is written using defun-lazy
,
the function body becomes infinitely expanded, causing the program to not compile.
LambdaCraft shows an error message in this case.
Using the Y combinator through defrec-lazy
prevents this infinite expansion from happening.
Things get more complex in the case of what is called mutual recursion.
In LambdaLisp, the functions read-expr
, eval
, eval-apply
, and repl
are mutually recursive,
meaning that these functions call each other inside their definitions.
Although using the normal Y combinator would still make the code compilable in this case,
it makes the same function be inlined over and over again, severely expanding the total output lambda term size.
The redundant inlining problem can be solved if each function held a reference to each other. This can be done by implementing a multiple-function version of the Y combinator. The derivation of a fixed point combinator for mutual recursion is described very intuitively in Wessel Bruinsma’s blog post, A Short Note on The Y Combinator, in the section “Deriving the Y Combinator”. Below is a summary of the derivation process introduced in this post.
Suppose that we have two functions and that are mutually recursive. Since they reference each other in their definitions, and can be defined in terms of some function and , which takes the definitions of and as its arguments:
and looks something like:
Where is a term containing and .
Now suppose that and could be written using some unknown function and as:
Plugging this into Equations (1) and (2),
Which can be abstracted as
Comparing both sides, we have
which are closed-form lambda expressions. Plugging this into our definitions and , we get the mutually recursive definitions of and .
A 4-function version of this mutual recursion setup is used in the init
function in LambdaLisp to define
repl
, eval-apply
, read-expr
, and eval
:
;; LambdaCraft
(let* read-expr-hat (lambda (x y z w) (def-read-expr (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* eval-hat (lambda (x y z w) (def-eval (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* eval-apply-hat (lambda (x y z w) (def-eval-apply (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* repl-hat (lambda (x y z w) (def-repl (x x y z w) (y x y z w) (z x y z w) (w x y z w))))
(let* repl (repl-hat read-expr-hat eval-hat eval-apply-hat repl-hat))
(let* eval-apply (eval-apply-hat read-expr-hat eval-hat eval-apply-hat repl-hat))
(let* read-expr (read-expr-hat read-expr-hat eval-hat eval-apply-hat repl-hat))
(let* eval (eval-hat read-expr-hat eval-hat eval-apply-hat repl-hat))
The functions *-hat
and def-*
each correspond to and in the derivation.
The final functions repl eval-apply read-expr eval
are defined in terms of these auxiliary functions.
The functions def-*
are defined in a separate location.
def-eval
, used to define eval
, is defined as follows:
;; LambdaCraft
(defun-lazy def-eval (read-expr eval eval-apply repl expr state cont)
...)
def-eval
is defined as a non-recursive function using defun-lazy
,
taking the four mutually dependent functions read-expr eval eval-apply repl
as the first four arguments,
followed by its “actual” arguments expr state cont
.
The first four arguments are occupied in the expression (def-read-expr (x x y z w) (y x y z w) (z x y z w) (w x y z w))
in the definition of read-expr-hat
. By currying, this lets eval
only take the 3 remaining unbound arguments, expr state cont
.
isnil
is one of the most important functions in cons
-based lambda calculus programming,
bearing the importance comparable to the atom
special form in McCarthy’s original pure Lisp.
Consider the following function reverse
that reverses a cons
-based list:
;; LambdaCraft
(defrec-lazy reverse (l tail)
(do
(if-then-return (isnil l)
tail)
(<- (car-l cdr-l) (l))
(reverse cdr-l (cons car-l tail))))
reverse
is a recursive function that reverses a nil
-terminated list made of cons
cells.
The base case used to end the recursion for reverse
is when l
is nil
,
decided by (isnil l)
.
Basically, any recursive function that takes a cons
-based list must check if the incoming list is either a cons
or a nil
to write its base case. However, these data types have very different definitions:
The function isnil
must return when its argument is ,
and return when the argument is any cell.
At first sight, it seems that this is impossible since there is no general way to check the equivalence of two given general lambda terms. Moreover, cons
cells are a class of lambda terms that have the form .
While checking the equivalence of general lambda terms is impossible due to the halting problem (c.f. here),
it is possible for some cases of comparing between lambdas with predefined function signatures.
This is also the case for isnil
, where we can derive its concrete definition as follows.
First observe that a cell is a function that takes one callback function and applies to it the two values that it holds. On the other hand, is a function that takes two functions as its argument. This difference makes the following sequence of applications to turn out differently for and :
Here, we applied and to and , where and are free variables. Notice that the abstractions in are used to ignore the values contained in the cell.
These sequences of applications can directly be used to define . To implement , we want the latter expression to evaluate to . This can be done by setting .
It then remains to find an where . From here we immediately get .
Therefore, can be written by abstracting this process:
which can be used as
The pattern is noticable in many places in lambdalisp.pdf,
since isnil
is used a lot of times in LambdaLisp.
The heavy use of CPS in LambdaLisp is supported by LambdaCraft’s do
macro.
Writing raw code in CPS causes a heavily nested code, since CPS is based on callbacks.
The do
macro makes such nested code to be written as flat code,
in a syntax close to Haskell’s do
notation for monads.
Here is the do
notation in Haskell:
do
(y, z) <- f x
w <- g y z
return w
A similar code can be written in LambdaCraft as:
(do
(<- (y z) (f x))
(<- (w) (g y z))
w)
which is macro-expanded to:
(f x
(lambda (y z)
(g y z
(lambda (w)
w))))
where f, g
is defined in CPS:
(defun-lazy f (x cont) ...)
(defun-lazy g (y z cont) ...)
Here, the cont
represents the callback function, which is directly written as the inner lambda
s.
The return type of eval
is string
, and not expr
.
This is because expr
is “returned” with CPS, by applying it to the provided callback:
;; LambdaCraft
(eval expr state
(lambda (eval-result state)
[do something with eval-result and state]))
This is written using do
as:
(do
(<- (eval-result state) (eval expr state)
[do something with eval-result and state]))
The nice part about the do
macro and having eval
return strings is that it makes print debugging very intuitive.
Since LambdaLisp is written in CPS, an arbitrary point in the eval
function eventually becomes the head of the current evaluation.
Therefore, at any point in the program, you can write
;; LambdaCraft
(eval expr state
(lambda (eval-result state)
(cons "a" (cons "b" (cons "c" [do something with eval-result and state])))))
which will make the entire expression eventually evaluate to
;; LambdaCraft
;; The outermost expression
(cons "a" (cons "b" (cons "c" ...)))
which will print “abc” in the console.
The previous code can be written in imperative style using do
as
;; LambdaCraft
(do
(<- (eval-result state) (expr state))
(cons "a")
(cons "b")
(cons "c")
[do something with eval-result and state])
which is virtually identical to writing (print "a")
in an imperative language.
Note that the default behavior of do
is to nest the successive argument at the end of the list, starting from the last argument, and <-
is a specially handled case by do
.
(cons "a")
can be replaced with a string printing function that accepts a continuation, such as:
;; LambdaCraft
(defun-lazy show-message (message state cont)
(do
(print-string message)
(cons "\\n")
(cont state)))
which will print message
, print a newline, and proceed with (cont state)
.
This design was very helpful during debugging, since it let you track the execution flow using print debugging. This design and technique can be used in other general lambda-calculus based programs as well.
Another large-scale programming technique is using macro signatures as a type-checking functionality.
Since all lambdas defined by def-lazy
, defun-lazy
or defrec-lazy
are curried in LambdaCraft,
there is no simple way to tell how many arguments a lambda that is being called takes.
This is different for LambdaCraft macros defined by defmacro-lazy
,
since LambdaCraft macros are implemented as Common Lisp functions that run on the Common Lisp environment to expand the macro at compile time.
Therefore, when a LambdaCraft macro is called with excessive or insufficient number of arguments,
it causes a Common Lisp error at compile time.
This works as a strong macro signature type-checker which significantly helps the debugging process of LambdaCraft,
letting you know when a LambdaCraft macro is called with insufficient arguments.
This effectively works as a simple type checker for macro call signatures.
The Common Lisp call stack even tells you which line got the macro call wrong which helps debugging as well.
It is therefore convenient to use as much macros as possible when writing programs in LambdaCraft.
Writing in terms of macros help reduce the code size as well,
since using macros can be seen as running compile-time beta reductions beforehand of the runtime.
For example, while cons
can be written as a function (defun-lazy cons (x y) (lambda (f) (f x y)))
,
it can also be written as a macro (defmacro-lazy cons (x y) `(lambda (f) ,x ,y))
,
which is the beta-reduced form of the function-based definition.
Either code will evaluate the same results by writing (cons p q)
,
except the function-based one requires an extra beta-reduction step at runtime, affecting the performance.
CPS causes the largest impact when extracting the values of a cons
cell.
This is illustrated in the JavaScript code below that runs on a browser’s JavaScript console.
In direct style, destructing a value of a cons
tuple is written as:
// Runs on the browser's JavaScript console
function car (x) {
return x(function (a, b) { return a; });
}
function cdr (x) {
return x(function (a, b) { return b; });
}
function cons (a, b) {
return function (f) {
return f(a,b);
};
}
function t (a, b) {
return a;
}
function nil (a, b) {
return b;
}
(function(x) {
return (function (y) {
return (function (z) {
return z(z, z);
})(car(y));
})(cdr(x));
})(cons(t, cons(nil, t)))
The last expression in this code first binds (cons t (cons nil t))
to x
, and calculates (car (cdr x))
.
Running this on the browser’s console should return the function nil
.
The car
and cdr
correspond to the following lambda terms that accept a data and return the data in the desired position:
On the other hand, in continuation-passing style, the same code is written as:
// Runs on the browser's JavaScript console
function cons (a, b) {
return function (f) {
return f(a,b);
};
}
function t (a, b) {
return a;
}
function nil (a, b) {
return b;
}
cons(t, cons(nil, t))( // cons returns a function that accepts a callback function. We pass a callback to it
function (tmp1, y){ // This function works as cdr; It takes the second value (and discards the first)
return y( // y == cons(nil, t) now. Inside, we write what we want to do with the y we receive via the callback. Here we pass another callback to the return value of the inner cons.
function (z, tmp2) { // This function works as car; It takes the first value (and discards the second)
return z(z, z); // z == nil now. nil selects the second value among its arguments, which here evaluates to nil.
}
);
}
)
Here, values are extracted without using car
or cdr
at all.
It instead uses the fact that a cons
cell is actually a function that accepts a callback function
that applies both of its contents to the provided callback.
This significantly improves performance when reading the stdin
, which is a list made from cons
cells:
;; LambdaCraft
(stdin
(lambda (c cdr-stdin)
(if (=-bit c "(")
...)))
The interpreters Blc, tromp, and uni runs programs writen in binary lambda calculus (BLC - also see here). The difference between binary lambda calculus and ordinary lambda calculus is in the notation, i.e. how lambda terms are written, and everything else including the rules of beta reduction are the same. BLC’s notation is based on De Bruijn notation, where variable names such as and are eliminated, and are instead replaced by an integer describing the relative nesting depth distances viewed from the variable’s surrounding scope.
The De Bruijn notation works as follows. Consider for example the term . Here, from the eyes of the term , the originating lambda of the variable is reached by hopping 1 abstraction , so the De Bruijn index for the term is 1. We can therefore rewrite as
and still recover its meaning. Similarly, The index for in would be 2, and for in would be 0 since no hops are required. This works for a more complicated setting, for example in the index for is 1. occurs twice in this term, and each are encoded differently in this case. The innermost has index 2 since 2 hops is required past and , but the outer ’s index is 0 since no hops are required to reach in the eyes of the outermost . We can then write
and we can still deduce which variable each integer corresponds to.
We then notice that when written in De Bruijn indices, the variable names in the lambda declaration becomes entirely redundant. The expression can thus be rewritten as
and it would still hold the same meaning.
We can simplify the notation more by writing function application as . Doing this we get:
By assuming that always takes exactly 2 parameters, we can eliminate the need for writing parentheses to express .
Binary lambda calculus then encodes this sequence as follows:
We thus have
which completes the definition of the binary lambda calculus notation.
The Blc interpreter (and the Universal Lambda interpreter) accepts this bitstream and parses it to a lambda term, which then applies beta reduction to execute the program.
Some more examples are
The elegance of this notation is that it is a prefix code, so no delimiting characters are required - the spaces between the and can be removed in practice. Moreover, for any valid proram , there exists no valid program that starts with followed with a nonempty .
Due to the prefix code property, the interpreter can distinguish the boundary between the program and the stdin even if they are provided as concatenated byte streams. For example, when running lambdalisp.blc
, we run:
cat lambdalisp.blc | ./asc2bin > lambdalisp.blc.bin
cat lambdalisp.blc.bin [filepath] | ./Blc
Normally, it is difficult or impossible to deduce the boundary between the binary lambdalisp.blc.bin
and [filepath]
since they are cat
ed together, but it is easily possible in BLC since all valid programs are prefix codes.
LambdaLisp was written by Hikaru Ikuta. The lambda calculus term compiler LambdaCraft was written by Hikaru Ikuta, inspired by Ben Rudiak-Gould’s Scheme program Lazier, a compiler from lambda terms written in Scheme to Lazy K. The LambdaLisp logo was designed by Hikaru Ikuta. The 521-byte lambda calculus interpreter SectorLambda was written by Justine Tunney. The IOCCC 2012 “Most functional” interpreter was written by John Tromp. The Universal Lambda interpreter clamb and Lazy K interpreter lazyk was written by Kunihiko Sakamoto.
]]>At the dawn of Lisp after its birth in 1958, Lisp was used as a language for creating advanced artificial intelligence. This project makes that a reality once again by implementing a neural network for pattern recognition written in pure lisp without built-in integers or floating-point numbers, that runs on the IBM PC model 5150.
SectorLISP is an amazing project where a fully functional Lisp interpreter is fit into the 512 bytes of the boot sector of a floppy disk. Since it works as a boot sector program, the binary can be written to a disk to be used as a boot drive, where the computer presents an interface for writing and evaluating Lisp programs, all running in the booting phase of bare metal on the 436-byte program. I have written another blog post on SectorLISP about extending SectorLISP to implement BASIC REPLs and games.
SectorLISP is implemented as a pure Lisp. In pure Lisp, there are no built-in types for integers or floating-point numbers, and only supports atoms and lists as available data structures. Surprisingly, even with the lack of numbers, such a Lisp is Turing-complete, meaning that it is basically capable of any calculation that can be done on modern computers.
In this project, we implement a neural network that runs on SectorLISP. Since there are no features of built-in numbers, we have to reinvent the notion of numbers from scratch only by using symbolic manipulation. We first start off by constructing a fixed-point number calculation system based solely on list manipulations, and finally, implement matrix multiplication and activation functions using this fixed-point number system.
Since SectorLISP runs on the IBM PC model 5150, this implementation allows neural networks to run on the booting phase of vintage PCs.
The source code for the SectorLISP neural network, as well as the training and testing scripts used to obtain the model parameters, are available at my GitHub repository:
https://github.com/woodrush/sectorlisp-nn
Here I will describe the instructions for running the SectorLISP program to calculate predictions for custom digit images in detail. The instructions for training and evaluating the neural network to obtain the model parameters used for this network is available at the repository.
The available emulators are QEMU and the i8086 emulator Blinkenlights. I will also describe how to run SectorLISP on physical hardware, except for this method you must type the entire Lisp program by hand into the computer. In the emulators, you can either automatically load the code or paste it into the console.
If you have QEMU installed, running the Lisp neural network on QEMU can be done by
running the following make
prodedure:
git clone https://github.com/woodrush/sectorlisp
git checkout nn
git submodule update --init --recursive
cd test
make nn
This will start QEMU with SectorLISP loaded as the boot sector program, and will automatically type the Lisp program into the emulator’s console.
Due to the way the test script handles the text stream between the host PC and QEMU, it first takes 10 minutes to type the entire Lisp source code to the emulator’s console. After waiting for 10 minutes, the actual inference time only takes about 4 seconds, where the program will show a message on the screen indicating the predicted digit. The running time was measured using a 2.8 GHz Intel i7 CPU.
To input a custom 3x5 digit image,
edit the following expression
at the end of the program, ./sectorlisp-nn/nn.lisp
, inside the sectorlisp-nn
submodule:
(QUOTE
;; input
)
(QUOTE (* * *
* . .
* * *
. . *
* * *))
Here are the instructions on running the network on the i8086 emulator Blinkenlights.
First, git clone
the SectorLISP repository and make
SectorLISP’s binary, sectorlisp.bin
:
git clone https://github.com/jart/sectorlisp
cd sectorlisp
make
This will generate sectorlisp.bin
under ./sectorlisp
.
By building a fork
of SectorLISP that supports I/O, an additional output with some messages indicating the input and the output will become printed.
In this case, git checkout
to the io
branch by running git checkout io
before running make
.
Since the source code for this project is backwards comptible with the main SectorLISP branch,
the same code can be run on both versions.
Update (2022/4/6): The fork mentioned here was merged into the original SectorLISP repository. The features mentioned here can now be used without using the fork, and by using the original SectorLISP repository.
To run SectorLISP on Blinkenlights, first follow the instructions on its download page and get the latest version:
curl https://justine.lol/blinkenlights/blinkenlights-latest.com >blinkenlights.com
chmod +x blinkenlights.com
You can then run SectorLISP by running:
./blinkenlights.com -rt sectorlisp.bin
In some cases in Ubuntu, there might be a graphics-related error showing and the emulator may not start. In that case, run the following command first available on the download page:
sudo sh -c "echo ':APE:M::MZqFpD::/bin/sh:' >/proc/sys/fs/binfmt_misc/register"
Running this command should allow you to run Blinkenlights on your terminal. Instructions for running Blinkenlights on other operating systems is described in the Blinkenlights download page.
After starting Blinkenlights,
expand the size of your terminal large enough so that the TELETYPEWRITER
region shows up
at the center of the screen.
This region is the console used for input and output.
Then, press c
to run the emulator in continuous mode.
The cursor in the TELETYPEWRITER
region should move one line down.
You can then start typing in text or paste a long code from your terminal into Blinkenlight’s console
to run your Lisp program.
To run the neural network program, copy the contents of nn.lisp from the repository to your clipboard, and paste it inside the terminal into Blinkenlight’s console. After waiting for about 2 minutes, the result will be shown on the console. Note that it is important to copy the newline at the end of the program, which will trigger the turbo mode on Blinkenlights which makes it run significantly faster. In this case, the screen will initially show nothing after you paste the code, but you can confirm that it is running by checking the CPU usage of your computer. If the code shows up right away after pasting with the cursor right next to the final parentheses of the code, you may have not included the newline, which takes significantly more time since it does not run in turbo mode.
On Blinkenlights, it took 2 minutes from pasting the code to obtaining the final inference results. The running time was measured using a 2.8 GHz Intel i7 CPU.
To input a custom 3x5 digit image, edit the following expression at the end of the program:
(QUOTE
;; input
)
(QUOTE (* * *
* . .
* * *
. . *
* * *))
You can also run SectorLISP on an actual physical machine if you have a PC with an Intel CPU that boots with a BIOS, and a drive such as a USB drive or a floppy disk that can be used as a boot drive. Note that when running the neural network program this way, you must type the entire program by hand into the console.
First, mount your drive to the PC you’ve built sectorlisp.bin on, and check:
lsblk -o KNAME,TYPE,SIZE,MODEL
Among the list of the hardware, check for the device name for your drive you want to write SectorLISP onto.
After making sure of the device name, run the following command, replacing [devicename]
with your device name.
[devicename]
should be values such as sda
or sdb
, depending on your setup.
Caution: The following command used for writing to the drive will overwrite anything that exists in the target drive’s boot sector, so it’s important to make sure which drive you’re writing into. If the command or the device name is wrong, it may overwrite the entire content of your drive or other drives mounted in your PC, probably causing your computer to be unbootable (or change your PC to a SectorLISP machine that always boots SectorLISP, which is cool, but is hard to recover from). Please perform these steps with extra care, and at your own risk.
sudo dd if=sectorlisp.bin of=/dev/[devicename] bs=512 count=1
After you have written your boot drive, insert the drive to the PC you want to boot it from. You may have to change the boot priority settings from the BIOS to make sure the PC boots from the target drive. When the drive boots successfully, you should see a cursor blinking in a blank screen, which indicates that you’re ready to type your Lisp code into bare metal.
We will now discuss the implementation details of this project.
We first start off by training a neural network on a modern computer using TensorFlow to get its model parameters. The parameters are then converted to 18-bit fixed-point numbers when loading to the SectorLISP program.
Training Dataset
Test Dataset
The entire dataset for training and testing this neural network is shown above. The input images are 3x5-sized binary monochrome images, which are converted to fixed-point vectors when being provided to the network.
The dataset, as well as the fully connected neural network model, were inspired by a blog post (in Japanese) about pattern recognition using neural networks, written by Koichiro Mori (aidiary).
The upper half is the training dataset that is used to train the neural network. The bottom half is the testing dataset, which is not shown at all to the network at training time, and will be shown for the first time when evaluating the neural network’s performance, to check if the digits for these newly shown images are predicted correctly.
In the final Lisp program, the input image is provided as follows:
(QUOTE
;; input
)
(QUOTE (* * *
* . .
* * *
. . *
* * *))
The model for our neural network is very simple. It is a two-layered fully connected network with a ReLU activation function:
In TensorFlow, it is written like this:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(5, 3)),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10),
])
The model and its implementation were referenced from the TensorFlow 2 quickstart for beginners entry from the TensorFlow documentation. As mentioned before, the fully-connected model was also inspired by a blog post (in Japanese) written by Koichiro Mori (aidiary).
This model takes a 3x5 image and outputs a 1x10 vector, where each element represents the log-confidence of each digit from 0 to 9. The final prediction result of the neural network is defined by observing the index that has the largest value in the output 1x10 vector.
Each fully connected neural network contains two trainable tensors A
and B
, which are the coefficient matrix and the bias vectors, respectively.
This network thus consists of 4 model parameter tensors, A_1
, B_1
, A_2
, and B_2
,
each of size 15x10, 10x1, 10x10, and 10x1, respectively.
The Dropout function is included for inducing generalization and is only activated at training time.
We use the categorical cross-entropy loss and the Adam optimizer for training:
model.compile(optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"])
The model is then trained for 1000 epochs:
model.fit(data_x, data_y_category, epochs=1000, verbose=0)
After training, the model parameters are saved:
model.save("params.h5")
The model parameters A_1
, B_1
, A_2
, and B_2
are contained in this file.
Since each model parameter has the sizes 15x10, 10x1, 10x10, and 10x1,
the total number of fixed-point numbers is 1620.
Since we are using 18 bits for each fixed-point number,
the total number of bits for the model parameters of the entire neural network is 29160 bits.
Note that although fixed-point numbers are used in the final Lisp implementation, the training process uses 64-bit floating-point numbers. Since the number of layers and the matrix sizes were both small enough for truncating the precision, we were able to directly convert the trained floating-point model parameter values to fixed-point numbers when loading them into the Lisp implementation.
The training time for the neural network in TensorFlow was 6.5 seconds on a 6GB GTX 1060 GPU.
The training accuracy was 100%, meaning that all of the 15 images in the training dataset are correctly predicted to the true digit.
The testing accuracy was 85%, meaning that 17 out of 20 newly seen images that were not shown at all during training were predicted correctly.
Here is the confusion matrix for the test dataset:
In the case for a 100% accuracy performance, the matrix becomes completely diagonal, meaning that the prediction results always match the ground truth labels. The three off-diagonal elements indicate the 3 prediction errors that occurred at test time.
Here are the 3 images that were not predicted correctly:
Test Dataset Image | Prediction |
---|---|
1 | |
3 | |
4 |
Since all of the other images were predicted correctly, this means that the neural network was able to correctly predict 85% of the unknown data that was never shown at training time. This capability of flexible generalization for newly encountered images is a core feature of neural networks.
“Lisp has jokingly been called “the most intelligent way to misuse a computer”. I think that description is a great compliment because it transmits the full flavor of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.” – Edsger W. Dijkstra
Now that we have obtained the model parameters for our neural network, it’s time to build it into pure Lisp.
As explained in the SectorLISP blog post, SectorLISP does not have a built-in feature for integers or floating-point numbers. The only data structures that SectorLISP has are lists and atoms, so we must implement a system for calculating fractional numbers only by manipulating lists of atoms. Our goal is to implement matrix multiplication in fixed-point numbers.
The fixed-point number system used in this project is also available as a SectorLISP library at my numsectorlisp GitHub repo.
The number system for this project will be 18-bit fixed-point numbers, with 13 bits for the fractional part, 4 bits for the integer part, and 1 bit for the sign.
Here are some examples of numbers expressed in this fixed-point system:
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)) ;; Zero
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0)) ;; One
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0)) ;; 0.5
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0)) ;; 0.25
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1)) ;; -1
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1)) ;; -0.5
(QUOTE (0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1)) ;; -0.25
(QUOTE (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1)) ;; Negative epsilon (-2^13)
;; |----------------------------| |------||-|
;; Fractional part Integer part \Sign bit
We first start by making a half adder,
which computes single-digit binary addition.
A half adder takes the binary single-digit variables A
and B
and outputs a pair of variables
S
and C
, each representing the sum and the carry flag.
The carry C
occurs when both input numbers are 1
needing a carry digit for addition.
Therefore, it can be written as:
(QUOTE
;; addhalf : Half adder
)
(QUOTE (LAMBDA (X Y)
(COND
((EQ X (QUOTE 1))
(COND
((EQ Y (QUOTE 1)) (CONS (QUOTE 0) (QUOTE 1)))
((QUOTE T) (CONS (QUOTE 1) (QUOTE 0)))))
((QUOTE T)
(CONS Y (QUOTE 0))))))
Next we make a full adder.
A full adder also computes single-digit binary addition, except it takes 3 variables including the carry digit,
A
, B
, and C
, and outputs the pair S
and C
for the sum and the carry flag.
Including C
will help to recursively compute multiple-digit addition in the next section.
This can be written as:
(QUOTE
;; addfull : Full adder
)
(QUOTE (LAMBDA (X Y C)
((LAMBDA (HA1)
((LAMBDA (HA2)
(CONS (CAR HA2)
(COND
((EQ (QUOTE 1) (CDR HA1)) (QUOTE 1))
((QUOTE T) (CDR HA2)))))
(addhalf C (CAR HA1))))
(addhalf X Y))))
Now that we have constructed a full adder, we can recursively connect these full adders to construct a multiple-binary-digit adder. We first start off by constructing an adder for unsigned integers.
Addition is done by first adding the least significant bits, computing the sum and the carry, and then adding the next significant bits as well as the carry flag if it exists. Since the full adder does just this for each digit, we can recursively connect full adders together to make a multiple-digit adder:
(QUOTE
;; uaddnofc : Unsigned N-bit add with carry
;; The output binary is in reverse order (the msb is at the end)
;; The same applies to the entire system
)
(QUOTE (LAMBDA (X Y C)
(COND
((EQ NIL X) Y)
((EQ NIL Y) X)
((QUOTE T)
((LAMBDA (XYC)
(CONS (CAR XYC) (uaddnofc (CDR X) (CDR Y) (CDR XYC))))
(addfull (CAR X) (CAR Y) C))))))
Here, X
and Y
are multiple-digit numbers such as (QUOTE (0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0)) ;; One
,
and C
is a single-digit carry flag.
This is where the reverse-ordered binary list format becomes useful.
Since addition is started by adding the least significant bits first,
we can immediately extract this bit just by applying (CAR X)
to the numbers.
The u
stands for unsigned, addn
means the addition of N (arbitrary) digits, of
means that overflow is prevented, c
means that there is a carry flag as an argument.
Since overflow is prevented, this means that the resulting sum may become one digit longer than the original inputs X
and Y
,
instead of overflowing to zero.
This is compensated later in other functions.
Finally, to add two unsigned integers X
and Y
, we wrap uaddnofc
with the carry flag initially set to 0
,
for unsigned integer addition:
(QUOTE
;; uaddnof : Unsigned N-bit add
)
(QUOTE (LAMBDA (X Y)
(uaddnofc X Y (QUOTE 0))))
Multiplication can be done similarly with addition, except we add multiple digits instead of one in each step.
In multiplication, we recursively shift X
one by one bit and add them up,
when the corresponding digit in Y
is 1
.
When the digit in Y
is 0
, we add nothing.
Shifting X
to the right has the effect of multiplying the number by 2.
Note that the shifting effect is reversed since the bit order is reversed.
Following this design, unsigned integer multiplication is implemented as follows:
(QUOTE
;; umultnof : Unsigned N-bit mult
)
(QUOTE (LAMBDA (X Y)
(COND
((EQ NIL Y) u0)
((QUOTE T)
(uaddnof (COND
((EQ (QUOTE 0) (CAR Y)) u0)
((QUOTE T) X))
(umultnof (CONS (QUOTE 0) X) (CDR Y)))))))
Now we are ready to start thinking about fixed-point numbers. In fact, we have already implemented unsigned fixed-point addition at this point. This is because of the most significant feature for fixed-point numbers, where addition and subtraction can be implemented exactly the same as signed integers.
This is because fixed-point numbers can be thought of as signed integers with a
fixed exponent bias 2^N
. Since the fraction part for our system is 13, the exponent bias is
2^(-13)
for our system.
Therefore, for two numbers A_fix
and B_fix
, we represent these numbers using an underlying integer A
and B
,
as A_fix == A * 2^(-13)
, B_fix == B * 2^(-13)
.
Now, when calculating A_fix + B_fix
, the exponent 2^(-13)
can be factored out,
leaving (A+B)*2^(-13)
. Therefore, we can directly use unsigned integer addition for unsigned fixed-point addition.
Multiplication is similar except that the exponent bias changes.
For A_fix * B_fix
in the previous example, the result becomes (A*B)*2^(-26)
, with a smaller exponent bias factor.
Here, we have a gigantic number A*B
compensated with the small exponent bias factor 2^(-26)
.
Therefore, to adjust the exponent bias factor, we can divide A*B
by 2^13
, and drop the exponent bias factor to 2^(-13)
.
In this case, dividing by 2^13
means to drop 13 of the least significant bits and to keep the rest.
In the case where the output number still has a bit length longer than A
and B
,
this means that the result overflowed and cannot be captured by the number of bits in our system.
This is the difference between floating-point numbers.
For floating-point numbers, the most significant bit can always be preserved by moving around the decimal point.
In fixed-point numbers, on the other hand, large numbers must have their significant bits discarded since the decimal point is fixed.
Therefore, it is a little odd to drop the significant bits, but this implementation yields the correct results.
Following this design, we can implement unsigned fixed-point multiplication as follows:
(QUOTE
;; ufixmult : Unsigned fixed-point multiplication
)
(QUOTE (LAMBDA (X Y)
(take u0 (+ u0 (drop fracbitsize (umultnof X Y))))))
(QUOTE
u0
indicates the unsigned integer zero, and fracbitsize
is a list of length 13 indicating the fraction part’s bit size.
u0
is added after dropping bits from the multiplication result,
since the bit length may be shorter than our system after dropping the bits.
take
and drop
are defined as follows:
(QUOTE
;; take : Take a list of (len L) atoms from X
)
(QUOTE (LAMBDA (L X)
(COND
((EQ NIL L) NIL)
((QUOTE T) (CONS (CAR X) (take (CDR L) (CDR X)))))))
(QUOTE
;; drop : Drop the first (len L) atoms from X
)
(QUOTE (LAMBDA (L X)
(COND
((EQ NIL X) NIL)
((EQ NIL L) X)
((QUOTE T) (drop (CDR L) (CDR X))))))
Now we will start taking the signs of the numbers into account.
In our fixed-point number system, negative numbers are expressed by taking the two’s complement of a number.
Negation using two’s complement is best understood as taking the additive inverse of the number in mod (2^13)-1
.
This yields a very simple implementation for negation:
(QUOTE
;; negate : Two's complement of int
)
(QUOTE (LAMBDA (N)
(take u0 (umultnof N umax))))
Here, umax
is a number filled with 1
, (QUOTE (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1))
.
When added by the smallest positive number (QUOTE (1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0))
,
umax
overflows to u0
which is filled with 0
, meaning the number zero.
Since negative numbers are numbers that become zero when added with their absolute value,
umax
represents the negative number with the smallest absolute value in our fixed-point number system.
Similarly, multiplying by umax
yields a number with the same property where the number
exactly overflows to zero with only one bit overflowing at the end.
Since the addition function in fixed-point numbers is defined exactly the same as unsigned integers,
this property means that the output of negate
works as negation in fixed-point numbers as well.
Therefore, this implementation suffices to implement negation in our number system.
At this point, we can define our final operators for +
and -
used for fixed-point numbers:
(QUOTE
;; +
)
(QUOTE (LAMBDA (X Y)
(take u0 (uaddnof X Y (QUOTE 0)))))
(QUOTE
;; -
)
(QUOTE (LAMBDA (X Y)
(take u0 (uaddnof X (negate Y) (QUOTE 0)))))
Subtraction is implemented by adding the negated version of the second operand.
We will now see how signed multiplication is implemented.
Signed fixed-point number multiplication is almost the same as unsigned ones, except that the signs of the numbers have to be managed carefully. Signed multiplication is implemented by reducing the operation to unsigned multiplication by negating the number beforehand if the operand is a negative number, and then negating back the result after multiplication. This simple consideration of signs yields the following implementation:
(QUOTE
;; *
)
(QUOTE (LAMBDA (X Y)
(COND
((< X u0)
(COND
((< Y u0)
(ufixmult (negate X) (negate Y)))
((QUOTE T)
(negate (ufixmult (negate X) Y)))))
((< Y u0)
(negate (ufixmult X (negate Y))))
((QUOTE T)
(ufixmult X Y)))))
Comparison is first done by checking the sign of the numbers. If the signs of both operands are different, we can immediately deduce that one operand is less than another. In the case where the signs are the same for both operands, we subtract the absolute value of each operand and check if the result is less than zero, i.e., it is a negative number.
So we start with a function that checks if a number is negative:
(QUOTE
;; isnegative
)
(QUOTE (LAMBDA (X)
(EQ (QUOTE 1) (CAR (drop (CDR u0) X)))))
This can be done by simply checking if the sign bit at the end is 1
,
since we have defined to use two’s complement as the representation of negative numbers.
We can then use this to write our algorithm mentioned before:
(QUOTE
;; <
)
(QUOTE (LAMBDA (X Y)
(COND
((isnegative X) (COND
((isnegative Y) (isnegative (- (negate Y) (negate X))))
((QUOTE T) (QUOTE T))))
((QUOTE T) (COND
((isnegative Y) NIL)
((QUOTE T) (isnegative (- X Y))))))))
Comparison in the other direction is done by simply reversing the operands:
(QUOTE
;; >
)
(QUOTE (LAMBDA (X Y)
(< Y X)))
Although division for general numbers can be tricky, dividing by powers of two can be done by simply shifting the bits by the exponent of the divisor:
(QUOTE
;; << : Shift X by Y_u bits, where Y_u is in unary.
;; Note that since the bits are written in reverse order,
;; this works as division and makes the input number smaller.
)
(QUOTE (LAMBDA (X Y_u)
(+ (drop Y_u X) u0)))
As mentioned in the comment, shifting left becomes division since we are using a reverse order representation for numbers.
At this point, we can actually implement our first neural-network-related function,
the rectified linear unit (ReLU).
Although having an intimidating name, it is actually identical to numpy’s clip
function
where certain numbers below a threshold are clipped to the threshold value.
For ReLU, the threshold is zero and can be implemented by simply checking the input’s sign and
returning zero if it is negative:
(QUOTE
;; ReLUscal
)
(QUOTE (LAMBDA (X)
(COND
((isnegative X) u0)
((QUOTE T) X))))
ReLUscal
takes scalar inputs. This is recursively applied inside ReLUvec
which accepts vector inputs.
At this point, we have finished implementing all of the scalar operations required for constructing a fully-connected neural network! From now on we will write functions for multiple-element objects.
The most simple one is the dot product of two vectors, which can be written by recursively adding the products of the elements of the input vectors:
(QUOTE
;; ================================================================
;; vdot : Vector dot product
)
(QUOTE (LAMBDA (X Y)
(COND
(X (+ (* (CAR X) (CAR Y))
(vdot (CDR X) (CDR Y))))
((QUOTE T) u0))))
Here, vectors are simply expressed as a list of scalars.
The vector (1 2 3)
can be written as follows:
(QUOTE ((0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0)
(0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0)
(0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0)))
Vector addition works similarly except we construct a list instead of calculating the sum:
(QUOTE
;; vecadd
)
(QUOTE (LAMBDA (X Y)
(COND
(X (CONS (+ (CAR X) (CAR Y)) (vecadd (CDR X) (CDR Y))))
((QUOTE T) NIL))))
Surprisingly, we can jump to vector-matrix multiplication right away once we have vector dot products. We first implement matrices as a list of vectors. Since each element in a matrix is a vector, we can write vector-matrix multiplication by recursively iterating over each element of the input matrix:
(QUOTE
;; vecmatmulVAT : vec, mat -> vec : Vector V times transposed matrix A
)
(QUOTE (LAMBDA (V AT)
((LAMBDA (vecmatmulVAThelper)
(vecmatmulVAThelper AT))
(QUOTE (LAMBDA (AT)
(COND
(AT (CONS (vdot V (CAR AT)) (vecmatmulVAThelper (CDR AT))))
((QUOTE T) NIL)))))))
An important property of this function is that the input matrix must be transposed before
calculating the correct result.
Usually, V @ A
where @
is matrix multiplication is defined by
multiplying the rows of V
with the columns of A
.
Taking the columns of A
is expensive in our Lisp implementation since we have to manage all of the
vector elements in A
at once in one iteration.
On the other hand, if we transpose A
before the multiplication,
all of the elements in each column become aligned in a single row which can be extracted at once as a single vector element.
Since we already have vector-vector multiplication, i.e., vector dot products defined,
this way of transposing A
beforehand blends in nicely with our function.
The name vecmatmulVAT
emphasizes this fact by writing AT
which means A
transposed.
Using vector-matrix multiplication, matrix-matrix multiplication can be implemented right away, by iterating over the matrix at the first operand:
(QUOTE
;; matmulABT : mat, mat -> mat : Matrix A times transposed matrix B
)
(QUOTE (LAMBDA (A BT)
((LAMBDA (matmulABThelper)
(matmulABThelper A))
(QUOTE (LAMBDA (A)
(COND
(A (CONS (vecmatmulVAT (CAR A) BT) (matmulABThelper (CDR A))))
((QUOTE T) NIL)))))))
Similar to vecmatmulVAT
, the second operand matrix B
is transposed as BT
in this function.
Note that we actually do not use matrix-matrix multiplication in our final neural network, since the first operand is always a flattened vector, and subsequent functions also always yield a vector as well.
Taking the argmax of the vector, i.e., finding the index of the largest value in a vector can simply be implemented by recursive comparison:
(QUOTE
;; vecargmax
)
(QUOTE (LAMBDA (X)
((LAMBDA (vecargmaxhelper)
(vecargmaxhelper (CDR X) (CAR X) () (QUOTE (*))))
(QUOTE (LAMBDA (X curmax maxind curind)
(COND
(X (COND
((< curmax (CAR X)) (vecargmaxhelper
(CDR X)
(CAR X)
curind
(CONS (QUOTE *) curind)))
((QUOTE T) (vecargmaxhelper
(CDR X)
curmax
maxind
(CONS (QUOTE *) curind)))))
((QUOTE T) maxind)))))))
A similar recursive function is img2vec
, where the *
-.
notation for the input image
is transformed to ones and zeros:
(QUOTE
;; img2vec
)
(QUOTE (LAMBDA (img)
(COND
(img (CONS (COND
((EQ (CAR img) (QUOTE *)) 1)
((QUOTE T) u0))
(img2vec (CDR img))))
((QUOTE T) NIL))))
Here, the variable 1
is bound to the fixed-point number one in the source code.
We are finally ready to define our neural network! Following the model, our network can be defined as a chain of functions as follows:
(QUOTE
;; nn
)
(QUOTE (LAMBDA (input)
((LAMBDA (F1 F2 F3 F4 F5 F6 F7 F8)
(F8 (F7 (F6 (F5 (F4 (F3 (F2 (F1 input)))))))))
(QUOTE (LAMBDA (X) (img2vec X)))
(QUOTE (LAMBDA (X) (vecmatmulVAT X A_1_T)))
(QUOTE (LAMBDA (X) (vecadd X B_1)))
(QUOTE (LAMBDA (X) (ReLUvec X)))
(QUOTE (LAMBDA (X) (vecmatmulVAT X A_2_T)))
(QUOTE (LAMBDA (X) (vecadd X B_2)))
(QUOTE (LAMBDA (X) (vecargmax X)))
(QUOTE (LAMBDA (X) (nth X digitlist))))))
This represents a chain of functions
from the input
to the nth
argument of digitlist
, which is a list of atoms
of the digits, (QUOTE (0 1 2 3 4 5 6 7 8 9))
.
Here, A_1_T
, B_1
, A_2_T
, and B_2
are the model parameters obtained from the training
section, converted to our fixed-point number system.
Now let’s try actually running our Lisp neural network! We will use the i8086 emulator Blinkenlights. Instructions for running the program in this emulator is described in the running the neural network on your computer section.
Let’s first try giving the network the following image of the digit 5:
(QUOTE (* * *
* . .
* * *
. . *
* * *))
It turns out like this:
The network correctly predicts the digit shown in the image!
Although the original network was trained in an environment where 64-bit floating-point numbers were available, our system of 18-bit fixed-point numbers was also capable of running this network with the same parameters truncated to fit in 18 bits.
Now let’s try giving another digit:
(QUOTE (* * .
. . *
. * *
* . .
* * *))
Notice that this image is not apparent in the training set, or even in the test dataset. Therefore, the network has never seen this image before, and it is the very first time that it sees this image.
Can the network correctly predict the digit shown in this image? The results were as follows:
The network predicted the digit correctly!
Even for images that were never seen before, the neural network was able to learn how to interpret images of digits only by giving some examples of digit images. This is the magic of neural networks!
Therefore, in a way, we have taught a Lisp interpreter that runs on the IBM PC model 5150 what digits are, only by providing example pictures of digits in the process. Of course, the accumulation of knowledge through training the network was done on a modern computer, but that knowledge was handed on to a 512-byte program that is capable of running on vintage hardware.
The training time for the neural network in TensorFlow was 6.5 seconds on a 6GB GTX 1060 GPU. The training was run on a 15-image dataset for 1000 epochs.
Here are the inference times of the neural network run in the emulators:
Emulator | Inference Time | Runtime Memory Usage |
---|---|---|
QEMU | 4 seconds | 64 kiB |
Blinkenlights | 2 minutes | 64 kiB |
The emulation was done on a 2.8 GHz Intel i7 CPU. When run on a 4.77 MHz IBM PC, I believe it should run 590 times slower than in QEMU, which is roughly 40 minutes.
The memory usage including the SectorLISP boot sector program, the S-expression stack for the entire Lisp program, and the RAM used for computing the neural network fits in 64 kiB of memory. This means that this program is capable of running on the original IBM 5150 PC.
It was very fun building a neural network from the bottom up using only first principles of symbolic manipulation. This is what it means for a programming language to be Turing-complete - it can basically do anything that any other modern computers are capable of.
As mentioned at the beginning of this post, Lisp was used as a language for creating advanced artificial intelligence after its birth in 1958. 60 years later in 2018, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun received the Turing Award for establishing the foundations of modern Deep Learning. In a way, using a Turing-complete Lisp interpreter to implement neural networks revisits this history of computer science.
The neural network for SectorLISP and its fixed-point number system discussed in this blog post were implemented by Hikaru Ikuta. The SectorLISP project was first started by Justine Tunney and was created by the authors who have contributed to the project, and the authors credited in the original SectorLISP blog post. The i8086 emulator Blinkenlights was created by Justine Tunney. The neural network diagram was created using diagrams.net. The training and testing dataset, as well as the fully connected neural network model, were inspired by a blog post (in Japanese) written by Koichiro Mori (aidiary) from DeNA. The TensorFlow implementation of the model was referenced from the TensorFlow 2 quickstart for beginners entry from the TensorFlow documentation.
]]>Lisp in Life is a Lisp interpreter implemented in Conway’s Game of Life.
The entire pattern is viewable on the browser here.
To the best of my knowledge, this is the first time a high-level programming language was interpreted in Conway’s Game of Life.
Lisp is a language with a simple and elegant design, having an extensive ability to express sophisticated ideas as simple programs. Notably, the powerful feature of macros could be used to modify the language’s syntax to write programs in a highly flexible way. For example, macros can be used to introduce new programming paradigms to the language, as demonstrated in object-oriented-like.lisp (which can actually be evaluated by the interpreter, although complex programs take quite a long time to finish running), where a structure and syntax similar to classes in Object Oriented Programming is constructed. Despite the expressibility of Lisp, it is the world’s second oldest high-level programming language introduced in 1958, only to be preceded by Fortran.
Conway’s Game of Life is a cellular automaton proposed in 1970. Despite it having a very simple set of rules, it is known to be Turing Complete. Lisp in Life demonstrates this fact in a rather straightforward way.
How can simple systems allow human thoughts to be articulated and be expanded? With the expressibility of Lisp and the basis of Conway’s Game of Life, Lisp in Life provides an answer to this question.
The Lisp program is provided by editing certain cells within the pattern to represent the ASCII-encoding of the Lisp program. The pattern directly reads this text and evaluates the results. You can also load your own Lisp program into the pattern and run it. The standard output is written at the bottom end of the RAM module, which can be easily located and directly examined in a Game of Life viewer. The Lisp implementation supports lexical closures and macros, allowing one to write Lisp programs in a Lisp-like taste, as far as the memory limit allows you to.
The Lisp interpreter is written in C. Using the build system for this project, you can also compile your own C11-compatible C code and run in on Conway’s Game of Life.
As previously mentioned, to the best of my knowledge, this is the first time a high-level programming language was interpreted in Conway’s Game of Life.
The entry featuring Universal Computers in LifeWiki has a list of computers created in the Game of Life. Two important instances not mentioned in this entry are the Quest For Tetris (QFT) Project created by the authors of the QFT project, and APGsembly created by Adam P. Goucher. All of these work are designed to run an assembly language and are not designed to interpret a high-level language per se.
An example of a compiled high-level language targeted for the Game of Life is Cogol by the QFT project. Cogol is compiled to the assembly language QFTASM, targeted for the QFT architecture. Although Cogol is targeted for the QFT architecture, it requires compilation to QFTASM for the code to be run in the QFT architecture.
In Lisp in Life, a modified version of the QFT architecture is first created for improving the pattern’s runtime. Modifications include introducing a new cascaded storage architecture for the ROM, new opcodes, extending the ROM and RAM address space, etc. The Lisp source code is then written into the computer’s RAM module as its raw binary ASCII format. The Conway’s Game of Life pattern directly reads, parses, and evaluates this Lisp source code to produce its output. This feature of allowing a Conway’s Game of Life pattern to evaluate a high-level programming language expressed as a string of text is a novel feature that was newly achieved in this project.
Here is a YouTube video showing Lisp in Life in action:
An overview of the entire architecture.
An overview of the CPU and its surrounding modules. On the top are the ROM modules, with the lookup module on the right, and the value modules on the left. On the bottom left is the CPU. On the bottom right is the RAM module.
This pattern is the VarLife version of the architecture. VarLife is an 8-state cellular automaton defined in the Quest For Tetris (QFT) Project, which is used as an intermediate layer to create the final Conway’s Game of Life pattern. The colors of the cells indicate the 8 distinct states of the VarLife rule.
The architecture is based on Tetris8.mc in the original QFT repository. Various modifications were made to make the pattern compact, such as introducing a new lookup table architecture for the ROM, removing and adding new opcodes, expanding the ROM and RAM address space, etc.
The Conway’s Game of Life version of the architecture, converted from the VarLife pattern. What appears to be a single cell in this image is actually an OTCA metapixel zoomed away to be shown 2048 times smaller.
A close-up view of a part of the ROM module in the Conway’s Game of Life version. Each pixel in the previous image is actually this square-shaped structure shown in this image. These structures are OTCA metapixels, which can be seen to be in the On and Off meta-states in this image. The OTCA Metapixel is a special Conway’s Game of Life pattern that can emulate cellular automatons with customized rules. The original VarLife pattern is simulated this way so that it can run in Conway’s Game of Life.
The OTCA Metapixel simulating Life in Life can be seen in this wonderful video by Phillip Bradbury: https://www.youtube.com/watch?v=xP5-iIeKXE8
A video of the RAM module in the VarLife rule in action.
The computer showing the results of the following Lisp program:
(define mult (lambda (m n)
(* m n)))
(print (mult 3 14))
The result is 42
, shown in binary ascii format (0b110100
, 0b110010
), read in bottom-to-up order.
As shown in this image, the standard output of the Lisp program gets written at the bottom end of the RAM module, and can be directly viewed in a Game of Life viewer. This repository also contains scripts that run on Golly to decode and view the contents of the output as strings.
The Lisp interpreter, written in C, is compiled to an assembly language for a CPU architecture implemented in the Game of Life, which is a modification of the computer used in the Quest For Tetris (QFT) project. The compilation is done using an extended version of ELVM (the Esoteric Language Virtual Machine). The Game of Life backend for ELVM was implemented by myself.
Generating a small enough pattern that runs in a reasonable amount of time required a lot of effort. This required optimizations and improvements in every layer of the project; a brief summary would be:
A more detailed description of the optimizations done in this project is available in the Implementation Details section.
VarLife is an 8-state cellular automaton defined in the Quest For Tetris (QFT) Project. It is used as an intermediate layer to generate the final Conway’s Game of Life pattern; the computer is first created in VarLife, and then converted to a Game of Life pattern.
When converting VarLife to Conway’s Game of Life, each VarLife cell is mapped to an OTCA Metapixel (OTCAMP). The conversion from VarLife to the Game of Life is done in a way so that the behavior of the states of the VarLife pattern matches exactly with the meta-states of the OTCA Metapixels in the converted Game of Life pattern. Therefore, it is enough to verify the behavior of the VarLife pattern to verify the behavior of the Game of Life pattern.
Due to the use of OTCA Metapixels, each VarLife cell becomes extended to a 2048x2048 Game of Life cell, and 1 VarLife generation requires 35328 Game of Life generations. Therefore, the VarLife patterns run significantly faster than the Game of Life (GoL) version.
Additional details on VarLife are available in the Miscellaneous section.
Pattern files preloaded with various Lisp programs are available here. Detailed statistics such as the running time and the memory consumption are available in the Running Times and Statistics section.
The patterns can be simulated on the Game of Life simulator Golly.
The VarLife patterns can be simulated on Golly as well. To run the VarLife patterns, open Golly and see File -> Preferences -> Control, and Check the “Your Rules” directory. Open the directory, and copy https://github.com/woodrush/QFT-devkit/blob/main/QFT-devkit/Varlife.rule to the directory.
object-oriented-like.lisp: This example creates a structure similar to classes in Object-Oriented Programming, using closures.
(new classname)
and (. instance methodname)
, are introduced using macros and functions.The Lisp interpreter’s variable scope and the macro feature is powerful enough to manage complex memory management, and even providing a new syntax to support the target paradigm.
printquote.lisp: A simple demonstration of macros.
factorial.lisp: A simple demonstration of recursion with the factorial function.
z-combinator.lisp: Demonstration of the Z Combinator to implement a factorial function using anonymous recursion.
backquote-splice.lisp:
Implements the backquote macro used commonly in Lisp to construct macros.
It also supports the unquote and unquote-splice operations, each written as ~
and ~@
.
primes.lisp: Prints a list of prime numbers up to 20. This example highlights the use of the while
syntax.
The contents of print.lisp is quite straightforward - it calculates and prints the result of 3 * 14
.
backquote.lisp and primes-print.lisp are similar to backquote-splice.lisp and primes.lisp, mainly included for performance comparisons.
backquote.lisp doesn’t implement the unquote-splice operation, and demonstrates some more examples.
primes-print.lisp reduces the number of list operations to save memory usage.
This Lisp interpreter supports lexical closures. The implementation of lexical closures is powerful enough to write an object-oriented-like code as shown in object-oriented-like.lisp, where classes are represented as lexical closures over the field variables and the class methods.
This Lisp interpreter supports macros. Lisp macros can be thought as a function that receives code and returns code. Following this design, macros are treated exacly the same as lambdas, except that it takes the arguments as raw S-expressions, and evaluates the result twice (the first time to build the expression, and the second time to actually evaluate the builded expression).
VarLife Patterns
Lisp Program and Pattern (VarLife) | #Halting Generations (VarLife) | Running Time (VarLife) | Memory Usage (VarLife) |
---|---|---|---|
print.lisp [pattern] | 105,413,068 (exact) | 1.159 mins | 5.0 GiB |
lambda.lisp [pattern] | 700,000,000 | 2.966 mins | 12.5 GiB |
printquote.lisp [pattern] | 800,000,000 | 3.424 mins | 12.5 GiB |
factorial.lisp [pattern] | 1,000,000,000 | 5.200 mins | 17.9 GiB |
z-combinator.lisp [pattern] | 1,700,000,000 | 9.823 mins | 23.4 GiB |
backquote-splice.lisp [pattern] | 4,100,000,000 | 20.467 mins | 27.5 GiB (max.) |
backquote.lisp [pattern] | 4,100,000,000 | 21.663 mins | 27.5 GiB (max.) |
object-oriented-like.lisp [pattern] | 4,673,000,000 | 22.363 mins | 27.5 GiB (max.) |
primes-print.lisp [pattern] | 8,880,000,000 | 27.543 mins | 27.5 GiB (max.) |
primes.lisp [pattern] | 9,607,100,000 | 38.334 mins | 27.5 GiB (max.) |
Conway’s Game of Life (GoL) Patterns
Lisp Program and Pattern (GoL) | #Halting Generations (GoL) | Running Time (GoL) | Memory Usage (GoL) |
---|---|---|---|
print.lisp [pattern] | 3,724,032,866,304 | 382.415 mins | 27.5 GiB (max.) |
lambda.lisp [pattern] | 24,729,600,000,000 | 1372.985 mins | 27.5 GiB (max.) |
printquote.lisp [pattern] | 28,262,400,000,000 | 1938.455 mins | 27.5 GiB (max.) |
factorial.lisp [pattern] | 35,328,000,000,000 | 3395.371 mins | 27.5 GiB (max.) |
z-combinator.lisp [pattern] | 60,057,600,000,000 | - | - |
backquote-splice.lisp [pattern] | 144,844,800,000,000 | - | - |
backquote.lisp [pattern] | 144,844,800,000,000 | - | - |
object-oriented-like.lisp [pattern] | 165,087,744,000,000 | - | - |
primes-print.lisp [pattern] | 313,712,640,000,000 | - | - |
primes.lisp [pattern] | 339,399,628,800,000 | - | - |
Common Statistics
Lisp Program | #QFT CPU Cycles | QFT RAM Usage (Words) |
---|---|---|
print.lisp | 4,425 | 92 |
lambda.lisp | 13,814 | 227 |
printquote.lisp | 18,730 | 271 |
factorial.lisp | 28,623 | 371 |
z-combinator.lisp | 58,883 | 544 |
backquote-splice.lisp | 142,353 | 869 |
backquote.lisp | 142,742 | 876 |
object-oriented-like.lisp | 161,843 | 838 |
primes-print.lisp | 281,883 | 527 |
primes.lisp | 304,964 | 943 |
The running times for each program are shown above. The Hashlife algorithm used for the simulation requires a lot of memory in exchange of speedups. The simulations were run on a 32GB-RAM computer, with Golly’s memory usage limit set to 28000 MB, and the default base step to 2 (configurable from the preferences). The memory usage was measured by Ubuntu’s activity monitor. “(max.)” shows where the maximum permitted memory was used. The number of CPU cycles and the QFT memory usage was obtained by running the QFTASM interpreter on the host PC. The QFT memory usage shows the number of RAM addresses that were written at least once. The memory usage is measured in words, which is 16 bits in this architecture.
All of the VarLife patterns can actually be run on a computer. The shortest running time is about 1 minute for print.lisp. A sophisticated program such as object-oriented-like.lisp can even run in about 22 minutes.
On the other hand, the Game of Life patterns take significantly more time than the VarLife patterns, but for short programs it can be run in a moderately reasonable amount of time. For example, print.lisp finishes running in about 6 hours in the Game of Life pattern. As mentioned in the “Conversion from VarLife to Conway’s Game of Life” section, since the Game of Life pattern emulates the behavior of the VarLife pattern using OTCA Metapixels, the behavior of the Game of Life patterns can be verified by running the VarLife patterns.
There are tests to check the behavior of the Lisp interpreter. There is a test for checking the QFTASM-compiled Lisp interpreter using the QFTASM interpreter, and a test for checking the GCC-compiled Lisp interpreter on the host pc. To run these tests, use the following commands:
git submodule update --init --recursive # Required for building the source
make test # Run the tests for the QFTASM-compiled Lisp interpreter, using the QFTASM interpreter
make test_executable # Run the tests for the executable compiled by GCC
Running make test
requires Hy, a Clojure-like Lisp implemented in Python available via pip install hy
.
Some of the tests compare the output results of Hy and the output of the QFTASM Lisp interpreter.
The tests were run on Ubuntu and Mac.
This section explains how to load the Lisp interpreter (written in C) to the Game of Life pattern, and also how to load a custom Lisp program into the pattern to run it on Game of Life.
Please see build.md from the GitHub repository.
This section describes the implementation details for the various optimizations for the QFT assembly and the resulting Game of Life pattern.
By profiling the GCC-compiled version of the Lisp interpreter, it was found that the string table lookup process was a large performance bottleneck. This was a large room for optimization.
The optimized string lookup process is as follows. First, when the Lisp parser accepts a symbol token, it creates a 4-bit hash of the string with the checksum of the ASCII representation of the string. The hash points to a hashtable that holds the root of a binary search tree for string comparison. Each node in the tree holds the string of the symbol token, and two nodes that are before and after the token in alphabetical order. When a query symbol token arrives in the parsing phase, a node with a matching token is returned, or a new node for the token is added into this binary tree if the token does not exist yet. This allows for each distinct symbol in the S-expression to have a distinct memory address.
In the interpretation phase, since each distinct symbol has a distinct memory address, and every string required for the Lisp program has already been parsed, string comparison can be done by simply comparing the memory address of the tokens. Since the interpreter only uses string equality operations for string comparison, simply checking for integer equality suffices for string comparison, speeding up the interpretation phase. Since the hash key is 4 bits long, this allows for reducing 4 searches in the binary tree compared to using a single binary tree.
There are 17 distinct procedures for evaluating the special forms in the Lisp interpreter,
define
, if
, quote
, car
, cdr
, cons
, atom
, print
, progn
, while
, {lambda
, macro
}, eval
, eq
, {+
, -
, *
, /
, mod
}, {<
, >
}, list
, and lambda/macro invocations (when if the token is not a special form). Using an if
statement to find the corresponding procedure for a given token becomes a linear search for the token comparisons. To speed up this search process, a hash table is created for jumping to the corresponding procedures. Since the memory addresses for the special forms can be determined before parsing the Lisp program, all of the symbols for the special forms have a fixed memory address. Therefore, the hash key can be created by subtracting an offset to the symbol’s memory address, to point to a hashtable that is created near the register locations. This hashtable is provided in memheader.eir. When the hash key is larger than the regions of this hashtable, it means that the symbol is not a special form, so the evaluation jumps to the lambda/macro invocation procedure.
The Lisp implementation has 3 distinct value types, ATOM
, INT
, and LAMBDA
. Each value only consumes one QFT byte of memory; the ATOM
value holds the pointer to the symbol’s string hashtable, the INT
value holds the signed integer value, and LAMBDA
holds a pointer to the Lambda
struct, as well as its subtype information, of either LAMBDA
, MACRO
, TEMPLAMBDA
and TEMPMACRO
. (The TEMPLAMBDA
and TEMPMACRO
subtypes are lambda and macro types that recycles its argument value memory space every time it is called, but is unused in the final lisp programs.) Since the RAM’s address space is only 10 bits, there are 6 free bits that can be used for addresses holding pointers. Therefore, the value type and subtype information is held in these free bits. This makes the integer in the Lisp implementation to be a 14-bit signed integer, ranging from -8192 to 8191.
Since the C compiler used in this project does not have memory optimization features, this has to be done manually within the C source code. This led to the largest reason why the interpreter’s source code seems to be obfuscated.
One of the largest bottlenecks for memory access was stack region usage. Every time a stack region memory access occurs, the assembly code performs memory address offset operations to access the stack region. This does not happen when accessing the heap memory, since there is only one heap region used in the entire program, so the pointers for global variables can be hard-coded by the assembler. Therefore, it is favorable optimization-wise to use the heap memory as much as possible.
One way to make use of this fact is to use as much global variables as possible. Since registers and common RAM memory share the same memory space, global variables can be accessed with a speed comparable to registers (However, since the physical location of the RAM memory slot within the pattern affects the I/O signal arrival time, and the registers have the most smallest RAM addresses, i.e. they are the closest to the CPU unit, the registers have the fastest memory access time).
Another method of saving memory was to use union memory structures to minimize the stack region usage. In the C compiler used in this project, every time a new variable is introduced in a function, the function’s stack region usage (used per call) is increased to fit all of the variables. This happens even when two variables never appear at the same time. Therefore, using the fact that some variables never appear simultaneously, unions are used for every occurence of such variables, so that they can use a shared region within the stack space. This led to minimization of the stack region usage. Since the stack region is only 233 hextets (1 byte in the QFT RAM is 16 bits) large, this allowed to increase the number of nested function calls, especially the nested calls of eval
which evaluates the S-expressions. Since the S-expressions have a list structure, and eval
becomes nested when lambdas are called in the Lisp program, this optimization was significant for allowing more sophisticated Lisp programs to be run in the architecture.
The QFT assembly generated by the C compiler has a lot of room for optimization. I therefore created a compiler optimization tool to reduce the QFTASM assembly size.
Immediate constant expressions such as ADD 1 2 destination
is folded to a MOV
operation.
MOV
foldingThe QFT assembly code can be splitted into subregions by jump operations, such that:
The last guarantee where jumps never occur in the middle of a subregion is provided by the C compiler. The ELVM assembly’s program counter is designed so that it increases only when a jump instruction appears. This makes an ELVM program counter to point to a sequence of multiple instructions, instead of a single instruction. Since the ELVM assembly uses the ELVM program counter for its jump instructions, it is guaranteed that the jump instructions in the QFT assembly never jump to the middle of any subregion, and always jumps to a beginning of a subregion.
In each subregion, the dependency graph for the memory address is created. If a memory address becomes written but is later overwritten without becoming used in that subregion at all, the instruction to write to that memory address is removed. Since it is guaranteed that jump operations never jump to the middle of any subregion, it is guaranteed that the overwritten values can be safely removed without affecting the outcome of the program. The MOV
folding optimization makes use of this fact to remove unnecessary instructions.
This folding process is also done with dereferences; if a dereferenced memory address is written, and the address is overwritten without being used at all, and the dereference source is unoverwritten at all during this process, the instruction for writingto the dereferenced memory address is removed.
If the destination of a conditional or fixed-destination jump instruction points to another jump instruction with a fixed destination, the jump destination is folded to the latter jump instruction’s destination.
A similar folding is done when a fixed jump instruction points to a conditional jump instruction, where the fixed jump instruction is replaced by the latter conditional jump instruction.
In this image of the CPU and its surrounding modules, the two modules on the top are the ROM modules. The original ROM module had one table, with the memory address as the key and the instruction as the value. I recreated the ROM module to add a lookup table layer, where each distinct instruction (not the opcodes, but the entire instruction including the values used within) holds a distinct serial integer key. The ROM module on the right accepts a program counter address and returns the instruction key for the program counter. The module on the left accepts the instruction key and returns the actual bits of the instruction as the output. This allows for dictionary compression to be performed to the ROM data, saving a lot of space. Since the instructions are 45 bits and the instruction keys are only 10 bits, the instruction key table is 1/4 the size of the original ROM module. Although the ROM size is 3223 for the entire Lisp interpreter, there were only 616 distinct instructions in the Lisp interpreter, making the size of the instruction table be 616 ROM units high, effectively reducing the ROM module size altogether.
The ROM module features another form of compression, where absence of cells are used to represent 0-valued bits within the instruction. Below is a close-up look of the ROM value module:
Notice that some cells on the left are absent, despite the table being expected to be a rectangular shape. This is because absent cells do not emit any signals, hence effectively emitting 0-valued bits as the output. To use this fact, all of the instructions are first alphabetically ordered at table creation time, so that instructions that start with trailing zeroes become located higher in the table (further from the signal source). This allows for a maximum number of cells to be replaced with absent units to represent 0-valued bits. In fact, the instruction for no-ops is represented as all zeroes, so all of the units in the value module are replaced by absent cells. The no-op instruction appears a lot of times immediately after the jump operation, due to the fact that the QFT architecture has a branch delay when invoking a jump instruction, requiring a no-op instruction to be present to compensate for the delay.
I removed the AND
, OR
, SL
(shift left), SRL
(shift right logical), and the SRA
(shift right arithmetical) opcodes, and added the SRU
(shift right unit) and SRE
(shift right eight) opcodes to the architecture. Since there already were opcodes for XOR
(bitwise-xor) and ANT
(bitwise-and-not), AND
and OR
, which were not used much in the interpreter, could be replaced by these opcodes. The bitshift operations had significantly larger patterns than the other opcodes, being more than 10 times larger than the other opcodes. These were reduced to a fixed-size shift operations which could be implemented in the same sizes as the other opcodes. Since the shift left opcode can be replaced by consecutively adding its own value, effectively multiplying by powers of 2, the opcode was safely removed. The main reason for the original bitshift units being large was due to the shift amounts being dependent on the values of the RAM. Converting a binary value to a physical (in-pattern) shift amount required a large pattern. On the other hand, shifting a fixed value could be implemented by a significantly more simpler pattern. The shift right eight instruction is mainly used for reading out the standard input, where each ASCII character in the input string is packed into one 16-bit RAM memory address.
This resulted in a total of exactly 8 opcodes, ANT
, XOR
, SRE
, SRU
, SUB
, ADD
, MLZ
, and MNZ
. Since this can fit in 3 bits, the opcode region for the instruction value was reduced by 1 bit. Since the RAM module is 10 bits, and the third value of the instruction is always the writing destination of the RAM, and the first instruction can also be made so that it becomes the reading source address of the RAM, this allows for an additional 6*2=12 bits to be reduced from the instruction length. These altogether has reduced the ROM word size from 58 to 45 bits, reducing nearly 1/4 of the original instruction size.
The original QFT architecture had a ROM and RAM address space of 9 and 7 bits. I extended the ROM and RAM address space to 12 and 10 bits, respectively. This was not a straightforward task as it first seemed, since the signal arrival timings between the modules had to be carefully adjusted in order for the signals to line up correctly. This involved reverse-engineering and experimenting undocumented VarLife pattern units used in the original QFT architecture. The same held for when redesigning other parts of the architecture.
Since each byte of the RAM module can be ordered arbitrarily in the CPU’s architecture, the RAM is arranged so that the standard output is written at the very bottom of the RAM module, and proceeds upwards. Therefore, the contents of the RAM can easily be observed in a Game of Life viewer by directly examining the bottom of the RAM module.
Since RAM has 16 bits of memory per memory address, it allows to fit two ASCII-encoded characters per one address. Therefore, the standard input is read out by reading two characters per address. For the standard output, one character is written to one address for aesthetic reasons, so that the characters can be directly observed in a Game of Life viewer the pattern more easily. Also, for the standard output to proceed upwards within the RAM module pattern, the memory pointer for the standard output proceeds backwards in the memory space, while the pointer for the standard input proceeds forwards in the memory space.
Optimizing the Game of Life layer mainly revolved around understanding the Macrocell format for representing and saving Game of Life patterns, and the Hashlife algorithm. The Macrocell format uses quadtrees and memoization for compressing repeated patterns. Since the final Game of Life pattern is an array of OTCA metapixels which are 2048x2048 large, and even has repeated patterns in the VarLife layer (meaning that there are repeated configurations of OTCA metapixels), this compression reduces the file size for the QFT pattern significantly. The best example that let me understand the Macrocell format was an example provided by Adam P. Goucher in this thread in Golly’s mailing list.
The Hashlife algorithm also uses quadtrees and memoization to speed up the Game of Life simulations. This algorithm makes use of the fact that the same pattern in a same time frame influences only a fixed extent of its surrounding regions, hence allowing for memoization.
As for optimization, I first noticed that the QFT pattern had a 1-pixel high pattern concatenated to the entire pattern. The original QFT pattern in the original QFT repository was carefully designed so that it is composed of 8x8-sized pattern units. Therefore, most of the patterns can be represented by 8x8 tiles. However, since the 1-pixel high pattern at the top creates an offset that shifts away the pattern from this 8x8 grid, it causes the pattern to have fewer repeated patterns if interpreted from the corner of its bounding box, causing the memoization to work inefficiently. I therefore tried putting a redundant cell (which does not interfere with the rest of the pattern) to realign the entire pattern to its 8x8 grid, which actually slightly reduced the resulting Macrocell file size from the original one. Although I didn’t compare the running times, since the Hashlife algorithm uses memoization over repeated patterns as well, I expect this optimization to at least slightly contribute to the performance of the simulation.
Another optimization was improving the metafier script used to convert VarLife patterns to Game of Life (MetafierV3.py). The original script used a square region to fit the entire pattern to create the quadtree representation. However, since the Lisp in Life VarLife pattern is 968 pixels wide but 42354 pixels high, it tried to allocate a 65536x65536-sized integer array, which was prohibitively large to run. I modified the script so that it uses a rectangular region, where absent regions of the quadtree are represented as absent cells. Although this is very straightforward with the knowledge of the Macrocell format, it was difficult at first until I became fond of the algorithms surrounding the Game of Life.
The memory region map is carefully designed to save space. This is best described with the operation phases of the interpreter.
Various precalculations are done after the interpreter starts running. The construction of the string interning hashtable for reserved atoms such as define
, quote
, etc. are done in this phase. For the GCC-compiled interpreter, some variables that are defined in the QFT memory header are defined in the C source.
Since the outcome of these precalculations are always the same for any incoming Lisp program, this phase is done on the host PC, and the results are saved as ramdump.csv during the QFTASM compile time. The results are then pre-loaded into the RAM when the VarLife and Game of Life patterns are created. This allows to saves some CPU cycles when running the interpreter.
As explained earlier, the QFT architecture holds register values in the RAM. There are 11 registers, which are placed in the addresses from 0 to 10.
The reserved values in the image include strings such as reserved atoms and the destinations of the jump hashtable used for evaluation. The rest of the region is used for storing global variables in the interpreter’s C source code.
The Lisp program provided from the standard input is parsed into S-expressions, which is written into the heap region.
Notice that the string interning hashtables are created in the later end of the stack region. This is because these hashtables are only used during the parsing phase, and can be overwritten during the evaluation phase. For most Lisp programs including the ones in this repository, the stack region does not grow far enough to overwrite these values. This allows to place 3 growing memory regions during the parsing phase, the stack region used for nested S-expressions, the heap region which stores the parsed S-expressions, and the string interning hashtables when new strings are detected within the Lisp program. Newly detected strings such as variable names in the Lisp program are also written into the heap region.
The heap region is also designed so that it overwrites the standard input as it parses the program. Since older parts of the program can be discarded once it is parsed, this allows to naturally free the standard input region which save a lot of space after parsing. The standard input also gets overwritten by the Standard output if the output is long enough. However, due to this design, long programs may have trouble at parsing, since the input may be overwritten too far and get deleted before it is parsed. A workaround for this is to use indentation which places the program further ahead into the memory, which will prevent the program from being overwritten from the growing heap region. For all of the programs included in this repository, this is not an issue and the programs become successfully parsed.
By this time, all of the contents of the stack region and what is ahead of the head of the heap region can be overwritten in the further steps. Note that a similar issue with the standard input happens with the standard output - when too many Lisp objects are created during runtime, it may overwrite the existing standard output, or may simply exceed the heap region and proceed into the stack region. Since the heap region is connected to the later end of the stack region, this may be safe if the standard output is carefully handled, but the interpreter will eventually start overwriting values of the stack region if the heap continues to grow.
This is one of the most interesting ideas in the original QFT project to make the QFT architecture possible. As explained in the original QFT post, the 8 states of VarLife are actually a mixture of 4 different birth/survival rules with binary states. This means that each VarLife cell can only transition between two fixed states, and the birth/survival rule for that cell does not change at any point in time. Moreover, the OTCA Metapixel is designed so that each metapixel can carry its own birth/survival rules. Therefore, each VarLife cell can be enoded into an OTCA Metapixel by specifying its birth/survival rule and the binary state. This means that the array of OTCA Metapixels in the metafied pattern is actually a mixture of metapixels with different birth/survival rules, arranged in a way so that it makes the computation possible.
After the program counter is set to 65535 and the program exits, no more ROM and RAM I/O signals become apparent in the entire module. This makes the VarLife pattern becomes completely stationary, where every pattern henceforth becomes completely identical. Defining this as the halting time for the calculation, the pattern for print.lisp halts at exactly 105,413,068 VarLife generations.
The halting time for the Game of Life patterns are defined similarly for the meta-states of the OTCA Metapixels. Since OTCA Metapixels never become stationary, the Game of Life states do not become stationary after the halting time, but the meta-states of the OTCA Metapixels will become stationary after the halting time.
For the VarLife pattern of print.lisp, by generation 105,387,540, the value 65535 gets written to the program counter. At generation 105,413,067, the last signal becomes just one step from disappearing, and at generation 105,413,068 and onwards, the pattern becomes completely stationary and every pattern becomes identical to each other. In the Game of Life version, since the OTCA Metapixel continues running indefinitely, the pattern does not become completly stationary, but the meta-states of the OTCA Metapixels will become completely stationary, since it is an emulation of the VarLife pattern. Note that the halting times for programs other than print.lisp is just a sufficient number of generations, and not the exact values.
The required number of generations per CPU cycle depends on many factors such as the ROM and RAM addresses and the types of opcodes, since the arriving times of the I/O signals depend on factors such as these as well. This makes the number of generations required for the program to halt become different between each program. For example, print.lisp has a rate of 23822.16 generations per CPU cycle (GpC), but z-combinator.lisp has a rate of 28870.81 GpC, and primes-print.lisp has 31502.43 GpC. 23822.16 GpC is in fact insufficient for z-combinator.lisp to finish running, and 28870.81 is also insufficient for primes-print.lisp to finish running.
The ALU unit in the CPU. From the left are the modules for the ANT
, XOR
, SRE
, SRU
, SUB
, ADD
, MLZ
, and the MNZ
opcodes.
The SRE
and the SRU
opcodes were newly added for this project.
The CPU architecture used in this project was originally created by the members of the Quest For Tetris (QFT) project, and was later optimized and modified by Hikaru Ikuta for the Lisp in Life project. The VarLife cellular automaton rule was also defined by the members of the QFT project. The metafier for converting VarLife patterns to Conway’s Game of Life patterns was written by the members of the QFT project, and was later modified by Hikaru Ikuta to support the pattern size of the Lisp in Life architecture. The assembly language for the QFT architecture, QFTASM, was also originally designed by the members of the QFT project, and was later modified by Hikaru Ikuta for this project for achieving a feasible running time. The Lisp interpreter was written by Hikaru Ikuta. The compilation of the interpreter’s C source code to the ELVM assembly is done using an extended version of 8cc written by Rui Ueyama from Google. The compilation from the ELVM assembly to QFTASM is done by an extended version of ELVM (the Esoteric Language Virtual Machine), a project by Shinichiro Hamaji from Preferred Networks, Inc. The Game of Life backend for ELVM was written by Hikaru Ikuta, and was later further extended by Hikaru for the Lisp in Life project.
]]>SectorLISP is an amazing project where a fully functional Lisp interpreter is fit into the 512 bytes of the boot sector of a floppy disk. Since it works as a boot sector program, the binary can be written to a disk to be used as a boot drive, where the computer presents an interface for writing and evaluating Lisp programs, all running in the booting phase of bare metal on the 436-byte program. As it hosts the Turing-Complete language of Lisp, I was in fact able to write a BASIC interpreter in 120 lines of SectorLISP code, which evaluates BASIC programs embedded as an expression within the Lisp code, shown in the screenshot above.
When I first saw SectorLISP and got it to actually run on my machine, I was struck with awe by how such a minimal amount of machine code could be used to open up the vast ability to host an entire programming language. You can write clearly readable programs which the interpreter will accurately evaluate to the correct result. I find it beautiful how such a small program is capable of interpreting a form of human thought and generating a sensible response that contains the meaning encapsulated in the inquired statement.
After writing various programs for SectorLISP, there was a particular thought that came into my mind. Even after writing the BASIC interpreter, I felt that there was one very important feature that could significantly enhance the capabilities of SectorLISP - that is, the ability to accept feedback from the user depending on the program’s output, by designing the interaction between the user and the computer.
The prime example of this is games.
Games are possible to be played on a computer since the player can react depending on the output of the computer.
Of course, even with pure functions as in SectorLISP,
it’s still possible to create a game if we make the user of the program run the same program again
every time the program demands a new input.
The entire history of user inputs can be expressed as a certain list in the program,
and the input and output states can be passed through the course of the entire program,
and the program can stop whenever a required input is not apparent, also showing its accumulated outputs.
However, such an interface that requires repeated inputs is rather inconvenient for the user,
inconvenient in the same sense that IF
is inconvenient than COND
,
and how lambdas that can take only one argument are inconvenient than lambdas that can take any number of arguments,
both being used to make the experience of the humans interacting with SectorLISP as simple and natural as possible.
When you think about it, the reason why computers are such a powerful device used almost everywhere in our lives today, is because they can be redesigned into an entirely different tool for an arbitrary purpose. The computer is then no longer a tool that is used only by the programmer, but can be used by anybody to run its applications. The transition from ENIAC to the dawn of the personal computing era was possible since computers became capable of general tasks other than computing equations, such as writing and saving documents for a business. Today, computers are being used for creating artwork, for playing games, for communicating with others, to only give a few examples. The entire history of computers is shaped by what new tasks computers became capable of, which is inseparable from the means of interaction between the human and the computer.
At the heart of the diverse applications for computers is the language used to program them. This is why programming languages capable of designing interactions are special - once a computer is programmed, it can leave the hands of the programmer and lie in the hands of the user, who interacts with it in a newly designed way.
As a matter of fact, all of the other languages mentioned in the SectorLISP blog post support an I/O functionality.
SectorFORTH has the key
and emit
instructions which reads a keystroke from the user and prints a character to the console.
BootBasic has the instructions input
and print
where input
stores a user input to a variable.
Even BF has the instructions ,
and .
capable of designing arbitrary user text input and output.
@rdebath has in fact made a text adventure game written entirely in BF.
Although the goal of SectorLISP is set in the realm of pure functions, I thought that it would be a massive gain if it were able to handle I/O and still have a smaller program size than the other languages mentioned in the SectorLISP blog post. In the context of comparing the binary footprint of programs, it would be a better comparison if all of the programs under discussion had even more functionalities in common. All of this could be achieved if we could construct a version of SectorLISP that is capable of handling user input and outputs that still has a small program size.
What could we do to empower SectorLISP with the puzzle piece of interaction?
What is a natural way of implementing I/O?
To answer this, I created a fork of SectorLISP that supports two new special forms,
READ
and PRINT
. These two special forms are the counterparts for the ,
and .
instructions in BF.
READ
accepts an arbitrary S-Expression from the user, and PRINT
prints the value of the evaluated argument to the console.
PRINT
also prints a newline when called with no arguments as (PRINT)
.
The fork is available here: https://github.com/woodrush/sectorlisp/tree/io
Update (2022/4/6): The fork was merged into the original SectorLISP repository. Thanks for reviewing and merging it!
Adding all of these features only amounted to an extra 35 bytes of the binary, with a total of 469 bytes, or 471 bytes including the boot signature. This is still 22 bytes or more smaller than the two former champions of minimal languages that fit in a boot sector mentioned in the SectorLISP blog post, which are SectorFORTH (491 bytes) and BootBasic (510 bytes). The rather minimal increase was achievable since most of the code for handling input and output were already available from the REPL’s functionality. This fork successfully shows that adding an I/O feature to SectorLISP will still allow it to have a smaller binary footprint than the two former champions.
Update: Thanks to a pull request by @jart, the author of the original SectorLISP, we’re down to 465 bytes or 467 bytes including the boot signature. Thank you @jart for your contribution! The details of the assembly optimizations including the one used in this pull request are discussed in the Assembly Optimizations section.
To run the SectorLISP fork, first git clone
and make
SectorLISP’s binary, sectorlisp.bin
:
git clone https://github.com/woodrush/sectorlisp
cd sectorlisp
git checkout io
make
Update (2022/4/6): Since the fork was merged into the original SectorLISP repository, you could now checkout https://github.com/jart/sectorlisp instead for using these features.
This will generate sectorlisp.bin
under ./sectorlisp
.
To run SectorLISP on the i8086 emulator Blinkenlights, first follow the instructions on its download page and get the latest version:
curl https://justine.lol/blinkenlights/blinkenlights-latest.com >blinkenlights.com
chmod +x blinkenlights.com
You can then run SectorLISP by running:
./blinkenlights.com -rt sectorlisp.bin
In some cases in Ubuntu, there might be a graphics-related error showing and the emulator may not start. In that case, run the following command first available on the download page:
sudo sh -c "echo ':APE:M::MZqFpD::/bin/sh:' >/proc/sys/fs/binfmt_misc/register"
Running this command should allow you to run Blinkenlights on your terminal. Instructions for running Blinkenlights on other operating systems is described in the Blinkenlights download page.
After starting Blinkenlights,
expand the size of your terminal large enough so that the TELETYPEWRITER
region shows up
at the center of the screen.
This region is the console used for input and output.
Then, press c
to run the emulator in continuous mode.
The cursor in the TELETYPEWRITER
region should move one line down.
You can then start typing in text or paste a long code from your terminal into Blinkenlight’s console
to run your Lisp program.
You can also run SectorLISP on an actual physical machine if you have a PC with an Intel CPU that boots with a BIOS, and a drive such as a USB drive or a floppy disk that can be used as a boot drive. First, mount your drive to the PC you’ve built sectorlisp.bin on, and check:
lsblk -o KNAME,TYPE,SIZE,MODEL
Among the list of the hardware, check for the device name for your drive you want to write SectorLISP onto.
After making sure of the device name, run the following command, replacing [devicename]
with your device name.
[devicename]
should be values such as sda
or sdb
, depending on your setup.
Caution: The following command used for writing to the drive will overwrite anything that exists in the target drive’s boot sector, so it’s important to make sure which drive you’re writing into. If the command or the device name is wrong, it may overwrite the entire content of your drive or other drives mounted in your PC, probably causing your computer to be unbootable. Please perform these steps with extra care, and at your own risk.
sudo dd if=sectorlisp.bin of=/dev/[devicename] bs=512 count=1
After you have written your boot drive, insert the drive to the PC you want to boot it from. You may have to change the boot priority settings from the BIOS to make sure the PC boots from the target drive. When the drive boots successfully, you should see a cursor blinking in a blank screen, which indicates that you’re ready to type your Lisp code into bare metal.
Here we present examples to showcase the capabilities of READ
and PRINT
.
A major example of interactive programs is games. I created a simple number guessing game that works on the fork of SectorLISP.
Here is a screenshot of the game in action, run in Blinkenlights:
Here is the text shown in the console:
(LET ' S PLAY A NUMBER GUESSING GAME. I ' M THINKING OF A CERTAIN NUMBER BETWEEN 1 AND 10. SAY A NUMBER, AND I ' LL TELL YOU IF IT ' S LESS THAN, GREATER THAN, OR EQUAL TO MY NUMBER. CAN YOU GUESS WHICH NUMBER I ' M THINKING OF?) (PLEASE INPUT YOUR NUMBER IN UNARY. FOR EXAMPLE, 1 IS (*) , 3 IS (* * *) , ETC.) NUMBER>(* * *) (YOUR GUESS IS LESS THAN MY NUMBER.) NUMBER>* (PLEASE INPUT YOUR NUMBER IN UNARY. FOR EXAMPLE, 1 IS (*) , 3 IS (* * *) , ETC.) NUMBER>(* * * * * * * *) (YOUR GUESS IS GREATER THAN MY NUMBER.) NUMBER>
We can see that the game is able to produce interactive outputs based on the feedback from the user,
which is an essential feature for creating games.
Note that there is also robust input handling in action,
where in the second input NUMBER>*
, the user writes an invalid input *
, which is not a list.
The game can handle such inputs without crashing.
The code is available at https://github.com/woodrush/sectorlisp-examples/blob/main/lisp/number-guessing-game.lisp.
The I/O feature can be used to transform the SectorLISP language itself as well.
As an example, I made an extended Lisp REPL where macro
, define
, progn
,
as well as print
and read
are all implemented as new special forms.
Here is an example session of the program:
REPL>(define defmacro (quote (macro (name vars body)
(` (define (~ name) (quote (macro (~ vars) (~ body))))))))
=>(macro (name vars body) (` (define (~ name) (quote (macro (~ vars) (~ body))))
))
REPL>(defmacro repquote (x)
(` (quote ((~ x) (~ x)))))
=>(macro (x) (` (quote ((~ x) (~ x)))))
REPL>(repquote (1 2 3))
=>((1 2 3) (1 2 3))
REPL>
The code is available at https://github.com/woodrush/sectorlisp-examples/blob/main/lisp/repl-macro-define.lisp.
In the example above, the user first uses the backquote macro `
to define defmacro
as a new macro,
then uses defmacro
to define a new macro repquote
.
These newly added features allow an interaction that is much more closer to those in modern Lisp dialects.
In the code,
these additional user inputs are included at the end of the code which could be directly pasted in the console.
However, we could look at this in another way - by writing the REPL code as the header,
we have effectively transformed the syntax of the language itself, by introducing new special forms
which were not present in the original interface.
The DEFINE
special form is also introduced in SectorLISP’s friendly branch,
which adds some extra bytes.
With READ
and PRINT
, we can instead build these new features on top of the interface as software,
allowing us to save a lot of the program size.
As a final example for drastically modifying the means of user interactions,
I made an interactive BASIC interpreter written in the I/O SectorLISP.
It runs a subset of BASIC with the instructions LET
, IF
, GOTO
, PRINT
, REM
, and the infix operators +
, -
, %
, and <=
.
Integers are expressed in unary as a list of atoms, such as (1 1 1)
.
Here is a screenshot of the final results, run in Blinkenlights:
@jart has created a video of it running on Blinkenlights (Thank you @jart!):
The code is available at https://github.com/woodrush/sectorlisp-examples/blob/main/lisp/basic-repl.lisp.
In this example, SectorLISP no longer presents an interface for evaluating Lisp expressions, but provides a new interface for recording and evaluating BASIC programs, transforming SectorLISP into an entirely different application. This highlights how programming languages can be used to redesign computers into tools for arbitrary purposes - using this SectorLISP program, users can now interact with the computer in a new way using the BASIC language.
Although it is indeed possible to run this evaluator as a static program as in the code shown at the beginning,
the new program is able to hide and encapsulate the details of the underlying Lisp program by presenting a new interface.
For the static version, the evaluator must also be entirely retyped again to evaluate a new BASIC program, which is a major difference in terms of interaction.
This shows how features as simple as READ
and PRINT
can be used to create a powerful application with the language.
In a way, we can think that SectorLISP now works as a minimal operating system,
and the programs within it such as this REPL works as an application that extends the capabilities of the underlying OS.
Let’s look at some details for dealing with I/O.
PROGN
using Pre-existing FeaturesFirst of all, side effects are inseparable from the notion of sequential execution. Although lambda bodies in SectorLISP can only have one expression, there is in fact an already built-in way to naturally manage sequential execution - you can pass expressions as the arguments of lambdas to make them executed sequentially!
For example, the following program allows the execution of three consecutive PRINT
s:
((LAMBDA () NIL)
(PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C)))
Here, each PRINT
statement is taken as the arguments of an empty lambda expression (LAMBDA () NIL)
,
which are all executed in the order of appearance.
This is possible since EVLIS
evaluates all of the arguments before calling PAIRLIS
to bind the values to the variables,
so all of the expressions get evaluated in order regardless to the number of arguments that the lambda expects.
Since this empty lambda can be used anywhere with an arbitrary number of expressions,
you can name it PROGN
and use it as follows:
((LAMBDA (PROGN)
(PROGN (PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C))))
(QUOTE (LAMBDA () NIL)))
Note that PROGN
always returns NIL
instead of the last expression inside the sequence,
which is different from the behavior in most Lisp dialects.
To extract the values from a PROGN sequence, you can create repeated lambda arguments as follows:
((LAMBDA (PROGN)
(PROGN (PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C))
(QUOTE D)))
(QUOTE (LAMBDA (X X X X) X)))
;; Returns (QUOTE D)
Note that the return value of PRINT
is designed to be undefined to save the program space.
This does not become a problem as will be discussed later.
You can use CONS
instead of PROGN
as well for the same purpose:
(CDR (CDR (CDR
(CONS (PRINT A)
(CONS (PRINT B)
(CONS (PRINT C)
(QUOTE D)))))))
These tools are enough to deal with sequential execution and the extraction of the executed expressions.
When I first came up with the PROGN
solution, I thought it was as if SectorLISP had been awaiting for sequential execution to be used.
Although pure expressions as in the original SectorLISP implementation do not require this feature,
it was a nice realization that this feature had already been built in so naturally in SectorLISP.
It is also pleasing that the syntax it provides is the same as modern Lisp dialects,
only with the difference that it always returns NIL
instead of the final value,
which still can be worked around using the methods discussed earlier.
PROGN
Since all of the values inside PROGN
are discarded after its execution,
you can write comments inside a PROGN
block, with the expense of some RAM space in the string interning region and some extra evaluations:
((LAMBDA (
PROGN ;;
)
(PROGN ;; (QUOTE - THIS PRINTS 3 CONSECTUTIVE LETTERS.)
(PRINT (QUOTE A))
(PRINT (QUOTE B))
(PRINT (QUOTE C))))
(QUOTE (LAMBDA () NIL))
NIL)
Here, the variable ;;
is bound to NIL
and is placed inside PROGN
.
Since ;;
immediately evaluates to NIL
and is discarded, this does nothing to the relevant states of the interpreter and the program.
Because ;;
actually does not comment out its following statement in SectorLISP,
the comment body that follows after is enclosed inside a QUOTE
form to prevent from it being executed,
which allows for its result to also be discarded after execution.
Also, note that the parentheses for the outer lambda have some extra newlines to prevent text editors from commenting out the parentheses )
.
This format is used in the number guessing program as well.
Although this is not a newly added feature, it is worth noting that loops can be implemented as recursion, by calling a function within itself.
In the number guessing game example, the functions MAIN
and GAMELOOP
are called within themselves to be executed an arbitrary number of times.
This combined with PROGN
provides a natural way for writing sequential programs.
The PRINT
feature is not only convenient for the user of the program, but in fact provides a helpful interface for the programmer as well.
That is, it allows for print debugging, to check the values occurring at runtime.
Even with Lisp having a comfortable syntax, even the most experienced programmer would have a difficult time debugging a large program
if the internal states and variables could not be observed at runtime.
This can be done by simply wrapping the expression with a predefined DEBUG
function:
((LAMBDA (DEBUG)
...
(DEBUG EXPR)
...
)
(QUOTE (LAMBDA (X) (CDR (CONS (PRINT X) X)))))
The need for the extra wrapper function DEBUG
occurs since the return value of PRINT
is designed to be undefined to save the program size.
The art of writing a program always comes with the act of deleting and revising a program,
by observing its behavior and the internal states.
The print debugging feature is a simple yet powerful interface that is a de facto requirement if one wishes to write large programs.
Such an interface is comparable to the reason why COND
is implemented in SectorLISP instead of IF
which usually induces a more obfuscated program structure.
I myself heavily used this print debugging feature to write the BASIC interpreter,
as well as the version that runs in the original SectorLISP which I wrote and debugged in the I/O SectorLISP fork.
PRINT
As it was mentioned earlier, PRINT
is designed to return an undefined value to save the program size.
Since values passed to PRINT
can be extracted using DEBUG
, and PRINT
can be used in PROGN
where values are discarded,
having PRINT
to return undefined values was not a problem for at least in all of the examples discussed before.
Running a bare PRINT
expression in the REPL also didn’t print any unwanted strings in the console,
so I consider that this property can be safely managed in most use cases.
Running various PRINT
expressions in the REPL turns out like this:
(PRINT (QUOTE A))
A
(PRINT (PRINT (QUOTE A)))
ANIL
(PRINT (READ))AAA
AAA
Notice that the results are slightly odd in the first expression,
since the REPL is supposed to show the return value of PRINT
as well as its effect of printing A
in the console,
but nothing is printed.
In the second expression, a nested PRINT
expression turns out to return NIL
, which is printed after A
as a return value by the REPL.
This phenomenon should not occur in well-written large programs,
if the program is written so that the return values of PRINT
is not referenced by anything,
which should be a natural result if they are all executed inside PROGN
.
READ
is much safer since it is by definition designed to have a valid return value regardless of its context.
At first, there was a bug where the first character was ignored by READ
, but it was fixed by caching the lookahead character from the user input
inside GetChar
, as fixed in 162969d
(the latest version uses the %bp
register instead of %fs
, as fixed in 1af3db7
).
Here we’ll cover the details of optimizing the assembly size. More details for the methods used in the original SectorLISP assembly code are available at the original SectorLISP blog post, https://justine.lol/sectorlisp2/.
This is a method used in the pull request by @jart,
the author of the original SectorLISP.
Conditional jumps in x86 are encoded in different instruction sizes depending on the size of the jump’s displacement.
When the displacement fits in one byte, i.e. it is between -128 and 127,
the instruction fits in two bytes, instead of four bytes when the displacement is larger than that size.
The pull request by @jart uses this feature by first reordering the functions within the assembly code,
allowing to shrink the displacements for the conditional jump instructions related to READ
and PRINT
.
This is a method used in the original SectorLISP implementation, which is used in the fork as well.
Consider the following example where a function A
calls another function B
and then immediately returns afterward:
A: mov %ax,%si
call B
ret
B: mov %si,%bp
ret
the code can then be reduced by two instructions without changing the behavior, by concatenating A
before B
as the following:
A: mov %ax,%si
; slide
B: mov %si,%bp
ret
This way, even if there is no ret
instruction in the A
block,
the control flow can immediately move inside B
where it has a ret
instruction.
The same ret
instruction is therefore shared by two functions A
and B
.
This allows function calls such as A
and B
to both behave the same as in the previous code with a fewer amount of instructions.
This method is used to implement READ
and PRINT
as an extension of .PutObject
and GetToken
where some additional instructions are run before the original functions.
This way of reusing existing code allowed the increase of the program size to be a rather small size.
I made a fork of SectorLISP that supports two new special forms READ
and PRINT
, which provides a natural I/O interface useful for both the programmer and the user of the program.
This allowed for the following findings:
READ
and PRINT
, you can now design the interactions between the user and the computer,
which is a feature supported in all of the other languages mentioned in the original SectorLISP blog post,
including SectorFORTH, BootBasic, and also BF.The video for the interactive BASIC REPL was created by Justine Tunney. The new I/O fork of SectorLISP discussed in this post was first created by Hikaru Ikuta, and have received contributions from Justine Tunney. The SectorLISP project was first started by Justine Tunney and was created by the authors who have contributed to the project, and the authors credited in the original SectorLISP blog post. I’d also like to thank Justine and Hannah Miller from the Rochester Institute of Technology for the fruitful discussion on improving this blog post.
]]>