Saturday, December 27, 2008

Simple lisp interpreters in C, Java and Erlang


Overview

I have already covered the lisp interpreters in Erlang and Java. I have created another example that is based on C. So, you have three different implementations in three different languages. Also, these interpreters are based on Peter Norvig's JScheme implementation. That is a fourth project that you can compare and contrast.

Peter Norvig created JScheme, a Lisp Scheme dialect written in Java. It was created in 1998 and it has since been forked, extended and used quite extensively in other modern projects for the web and elsewhere. The C lisp interpreter is based on the JScheme interpreter. The JScheme code is copyrighted to Peter Norvig This is only a modifications to make our Lisp even lighter
than it was originally. Norvig's most recent implementation of JScheme contains 1905 lines of Java code. This implementation contains 2000 lines of C code wrapped in additional Javadoc comments, including implementations of list, hashtable datastructures.

There is a lot that can be added to Norvig's implementation. Add full support for a R5RS Scheme Implementation. Add better input/output support. Copy interesting functions from Arc or Common Lisp. Write a Scheme compiler. In fact, the source code in my implementation and Norvig's is fairly straightforward,

C implementation

To Compile:
type 'make' at the command line prompt
To Run:
type 'make run' to run the tests or ./octanec

For Example:

test.lisp contains the following:
-------------
(+ 4 5 6)
(+ 6 6 6)
(+ 1 2 3 )
(+ 1 (+ 1 (* 1 1)))
-------------

./octanec test.lisp

Example Output

...
...
trace: eval[t1] - instance of procedure calltrace: eval - instance of String: data=[+]
trace: eval - instance of non-pair =>
trace: eval - instance of non-pair =>
trace: eval - instance of non-pair =>
at apply id= 27 4.000000 x==>97125c0 16
at plus 27

test eval [result after eval]: object=9712918 type=16 [15.000000]
...
...

Object Oriented C

The code is written with an object oriented approach. There is one generic object definition that contains data for different types.
octane_object.h

/*
 * The default object type for the scheme parser.
 * The obj_type determines.
 * 
 * If the struct is of a particular type 'E.g. String'
 * then we will need reference that particular pointer.
 *
 * String = string_data;
 * Char   = char_data;
 */
typedef struct Object_struct {
    int    obj_type;    
    char   *string_data;
    double double_data;
    int    boolean_data;
    char   char_data;

    struct Pair_struct      *pair_data;
    struct Procedure_struct *proc_data;
} Object;

To create a new string object, we would do something like the following:
struct Object_struct *new_empty_object(int obj_type) {
    struct Object_struct *new_obj = (struct Object_struct *) malloc(sizeof(struct Object_struct));
    new_obj->obj_type = obj_type;
    return new_obj;
}
...
...
...

struct Object_struct *new_str_object(char *string_data) {
    struct Object_struct* obj = new_object(OBJ_STRING_TYPE, string_data, 0, 0);
    return obj;
}

With Java, Compare and Contrast
The C implementation is an almost exact copy of the Java implementation. Of course, with the C implementation, we created the list, hashtable, and buffer libraries that are available in Java as well as some other modifications. But, the logic is essentially the same as the JScheme and mini scheme code. For example, here is the C and Java version of the lisp 'read' routine.
/**
 * Read and return a Scheme expression, or EOF.
 */
struct Object_struct *read(struct Octane_struct *octane) {
    struct Object_struct *token;
    char *str;
    printf("trace: read()\n");

    token = next_token(octane);
    if (token == NULL) {
        return NULL;
    }
    if (token->obj_type == OBJ_STRING_TYPE) {
        str = token->string_data;
        if (strcmp("(", str) == 0) {
            return read_tail(octane);
        } else if (strcmp(")", str) == 0) {
            printf("WARN: Extra ')' ignored.");
            return read(octane);
        } else {
            return token;
        } // End of the if - else    } else {
        return token;       
    } // End of the if - else}

And the Java version...
    /**
     * Read and return a Scheme expression, or EOF.
     */
    public Object read() {
        System.out.println("trace: read()");
        try {
            Object token = nextToken();
            System.out.println("trace: read() token=" + token);
            if (token == "(") {
                return readTail();
            } else if (token == ")") {
                System.out.println("WARN: Extra ')' ignored.");
                return read();
            } else {
                return token;
            } // End of the if - else        } catch (IOException e) {
            System.out.println("WARN: On input, exception: " + e);
            return EOF;
        } // End try - catch    }


References

Full Source for all three implementations, all_simple_lisp
Description of the Java Implementation

Description of the Erlang Implementation

http://jvmnotebook.googlecode.com/files/mini_jscheme.tar.gz

Bugs

In the load_scheme function, there is an error with the use of 'strcmp', change to the following.
load_scheme():
...
    if (object->obj_type == OBJ_STRING_TYPE) {
                if (strcmp(object->string_data, OCT_EOF) == 0) {
                    printf("exiting read loop (EOF)\n");
                    return;
                }
            }
...

-----------
Addendum(2) - Simple RPN Calculator.

With very little effort, I changed the lisp interpreter into a concatenative/stack language interpreter. In the update, instead of parsing the lisp syntax, the parser detects the various tokens and places or pops values off the stack. With Lisp, the main data structure is the 'list', with stack languages the core data structure mainly consists of a LIFO stack.

"In Reverse Polish notation the operators follow their operands; for instance, to add three and four, one would write "3 4 +" rather than "3 + 4". If there are multiple operations, the operator is given immediately after its second operand; so the expression written "3 − 4 + 5" in conventional infix notation would be written "3 4 − 5 +" in RPN: first subtract 4 from 3, then add 5 to that. An advantage of RPN is that it obviates the need for parentheses that are required by infix"

http://en.wikipedia.org/wiki/Reverse_Polish_notation

Here is an example source file (test.joy):
10 20 +
;; Result after call should be 30
4 * dup *

The values 10 and 20 are the first values put on the stack, these numbers are input parameters to the '+' operation.  The result of '30' is placed onto the stack.  Then the value of 4 is placed on the stack. The '*' multiplication operator is called with 4 and 30 as parameters.  And so on.  Here is the output from these operations:
Usage: octanec <source file>
trace: [EVAL RESULT]:  object=661ef8 type=16 [10.000000]

+- STACK -- top ---------+
| id:0 10.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662060 type=16 [20.000000]

+- STACK -- top ---------+
| id:0 20.000000
| id:1 10.000000
+--------- bottom -------+
        $ function input => (+)
+- STACK -- top ---------+
| id:0 20.000000
| id:1 10.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662160 type=16 [30.000000]

+- STACK -- top ---------+
| id:0 30.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662228 type=16 [4.000000]

+- STACK -- top ---------+
| id:0 4.000000
| id:1 30.000000
+--------- bottom -------+
        $ function input => (*)
+- STACK -- top ---------+
| id:0 4.000000
| id:1 30.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662328 type=16 [120.000000]

+- STACK -- top ---------+
| id:0 120.000000
+--------- bottom -------+
        $ function input => (dup)
trace: [EVAL RESULT]:  object=662328 type=16 [120.000000]

+- STACK -- top ---------+
| id:0 120.000000
| id:1 120.000000
+--------- bottom -------+
        $ function input => (*)
+- STACK -- top ---------+
| id:0 120.000000
| id:1 120.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662570 type=16 [14400.000000]

+- STACK -- top ---------+
| id:0 14400.000000
+--------- bottom -------+
        $ function input => (print_stack)
+- STACK -- top ---------+
| id:0 14400.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662570 type=16 [14400.000000]

+- STACK -- top ---------+
| id:0 14400.000000
+--------- bottom -------+
building (EOF)
;;*************************************;; Result after call should be = '14400.000000';; In Joy, we get the following as well:;; 10 20 + 4 * dup *.;;*************************************

Most of the modifications from the original Lisp interpreter code were made to the apply function. Instead of constructing a linked-list pair structure, all operations are made against the main stack:
...
...
    if (minArgs == 2) {
        switch (idNumber) {
        case PLUS:
        case MINUS:
        case DIVIDE:
        case TIMES:
            compute_res = stack_compute(idNumber, main_stack);
            if (compute_res != NULL) {
                stack_push(main_stack, compute_res);
            }
            return compute_res;
        case SWAP:
            // swap      :   X Y  ->   Y X            // Interchanges X and Y on top of the stack.            last_obj  = stack_pop(main_stack);
            last_obj2 = stack_pop(main_stack);
            stack_push(main_stack, last_obj);
            stack_push(main_stack, last_obj2);
            return last_obj2;
        default:
            printf("Invalid id number=%d\n", idNumber);
            error("Internal error: unknown primitive: ");
            return NULL;
        }; // End of Switch    } else if (minArgs == 1) {
        switch(idNumber) {
        case DUP:
            // dup      :   X  ->   X X            // Pushes an extra copy of X onto stack.            last_obj = stack_pop(main_stack);
            stack_push(main_stack, last_obj);
            stack_push(main_stack, last_obj);
            return last_obj;
        case ID:
            last_obj = stack_pop(main_stack);
            stack_push(main_stack, last_obj);
            return last_obj;
        case POP:
            // pop      :   X  ->            // Removes X from top of the stack.            last_obj = stack_pop(main_stack);
            return last_obj;
        default:
            printf("Invalid id number=%d\n", idNumber);
            error("Internal error: unknown primitive: ");
            return NULL;
        }; // End of Switch...
...

More RPN/Concatenative language resources:

http://www.latrobe.edu.au/philosophy/phimvt/joy.html

http://haskellnotebook.googlecode.com/files/simple_rpncalc.tar.gz

http://math-services.appspot.com/lang2.jsp
-----------

No comments: