Contact: zeng  @  zegraph.com      Last update: 20 January 2010

ZeScript Language

Introduction

ZeScript is a small, simple, embeddable, and thread-safe scripting language with C-like syntax. It is a by-product in the search for a scripting language for ZeGraph. Of the many wonderful languages out there, Lua and C-Talk have been used. Both have their merit but lack features that are required to make it natural to call functions of or to apply operators to user objects. For example, re-defining operators for user objects was limited in Lua; and while more may be done in C-Talk, it did not has the mechanism of Lua to call functions like they are methods of a user object.

ZeScript is light and fast -- the whole executable, including the standard library, is about 200 K. The language is easy to learn too, especially for those with C experience.

ZeScript is easy to extend. The sources for dynamic link libraries of Excel COM (excel.cpp), EXPAT(expat.cpp), SQLITE (sqlite.cpp), PostgreSQL (pgsql.cpp), MySQL (mysql.cpp), netCDF (netcdf.cpp), HDF (hdf.cpp), wxWidget (wdget.cpp), and IIS extension (zsiis.cpp) speak for themselves. Not only can you add primitive functions for specified types of user objects, you can also assign a primitive function to re-define operator behaviors for any type of user objects. The source code for the matrix library shows how to make operators work on data array of char, short, integer, float, and double; and shows how to define __get and __set functions to access matrix values through the array access expressions of ZeScript.

ZeScript supports primitive object types of null, boolean, integer, real, string, hash array, and user. Upon those, a structured object type, i.e., a class, may be defined to encapsulate variables of primitive objects and script functions.

ZeScript uses re2c to generate its token scanner, resulting in very fast script parsing.

Variable and Object Types

ZeScript variables are dynamic and are created or updated by assignment. The object type that a variable represents may be null, boolean, integer, real, string, hash array, class, or user object. A user object holds a pointer created by user's primitive function. For example,

/******************************************************
 *   Like C, contents between /* and */ are comments
 * and a long comments like this one may be extended
 * to multiple lines.
 *
 *  As shown bellow, any thing after // becomes a
 * one-line comment.
 ******************************************************/

a = 135;                     // a is integer type
a = 0x07FF;                  // hexadecimal equivalent of 2047
a = 135.0;                   // a is now real type
b = true;                    // b is boolean type
c = false;                   // c is boolean type
d = e = f = null;            // d, e, f are set to null
s = "I love ZeScript.";      // s is string type

An expression must end with simi-column ";".

Internally, a boolean of true is equivalent to an integer of 1 and false to 0. A real is a double floating number and an integer is 64-bits (32-bits prior to version 2.0). A string, i.e., text between a pair of ' or " marks, may contain C escape characters. In a ' quoted string \' can be used to include ' in the string. The same applies to " quoted string. Since \ is used for escaping, it must be presented as \\ in a string. A string may also occupy multiple lines, but such a long string must not exceed 8 KB.

A hash array created by the operator [] contains a collection of key-value pairs, of which the key is a string and the value can be an object of any type.

a = ["name"="Jiye Zeng", "Age"="Secret"];  // , is used to separate array items

When the key is omitted, the string equivalent of position index counting from zero will be used, e.g.,

a = [1, 3, 5];
// is the same as
a = ["0" = 1, "1" = 3, "2" = 5];
Note that ["name"=..., ...] is not the same as [name=..., ...]: While "name" in the former is a string, name in the latter is a variable that may represent a string or any other type of object.

Assignment

An assignment expression set the right to the left or defines a new variable if the left does not exist. The left will get a copy of the right if the right is a type of null, boolean, integer, real, string, or array; and get the reference to the right for class and user types unless the __copy function is defined for a class or registered as a primitive function for a user object. In that case, the __copy function will be called with the user object as the first parameter and the object resulted from the right expression as the second parameter.

The array creation expression may be used for multiple variable assignment:

[a, b, c] = [1, 2, 3];        // a=1, b=2, c=3
[a, b, c] = [1, 2];           // a=1, b=2, c=null
[a, b, c] = [1, "a"=2, 3];    // a=1, b=null, c=3
[a, b, c] = 1;                // a=1, b=null, c=null

Assigning to multiple variables works as follows:

  1. Items in the left array list must be variable names. And if :: operator is used to indicate a global variable, that variable must be defined in an ancestor of the expression.
  2. If the right object is not array, assign it to all variables;
  3. Otherwise get objects in the array on the right using variables' positions on the left as keys and assign obtained objects to corresponding variables.

This feature offers a convenient way to receive multiple values returned as an array from a function.

Array Access

The expressions expr.expr and expr[expr] are called array access or member access.

Objects in an array may be set or got through their keys, e.g.,

a.addr = "123 Street";      // a new key-value pair will be created
                            // if a is an array and the address key 
                            // does not exist.

a["addr"] = "123 Street";   // same as above

b = a.addr;                 // b is now "123 Street"
b = a["addr"];              // same as above

When the key of an item is an integer, the integer's string equivalent is used as the key, e.g.,

a = [1, 3, 5];   // create array
b = a[0];        // b contains 1
a[1] = 10;       // now a contains [1, 10, 5]

Array access returns null if the array does not have the key.

When an array contains only numerical values and all keys are positional, it will be treated like a collection of numbers by operators and functions, e.g.,

a = [1, 2.1, 3.5, 5];
a++;             // a-array now contains [2, 3.1, 4.5, 6];
b = sin(a)       // b-array contains [sin(2), sin(3.1), sin(4.5), sin(6)]

Multiple keys are also allowed in getting and setting values:

a = [10, 11, 12, 13, "hi"];   // create a array
b = a[1,4];                   // b becomes an array containing [11,"hi"]
a[0,1] = [100,200];           // now a contains [100,200,12,13,"hi"]
a[2] = b;                     // now a contains [100,200,[11,"hi"],13,"hi"]

Multiple assignment to array items works as follows:

  1. Expressions inside [] of the left expression must produce strings or integers to represent array keys.
  2. If the right object is not an array, assign it to all keys;
  3. Otherwise, get objects in the right array using positions of key expressions on the left as keys and assign obtained objects to keys of the left array.

Array Access for String

The array access expression a[...] also may be used to get and set the string characters, e.g.,

s = "ABCDEFG";    // create a string
a = s[1];         // a contains "B"
s[0] = a;         // now s contains "BBCD"
b = s[1,3];       // b contains "BD";
s[1,3] = "A";     // now s contains "AACADEFG"
s[1,3] = "XYZ"    // now s contains "AXCYDEFG"
s[5] = 90;        // now s contains "AXCYDZFG";

Accessing sub-string through the array access expression works as follows:

  1. Expressions inside [] of the left expression must produce integers in the range of 0 to the left string length to represent indices of characters in the left string. And positions of those expressions are used as indices in finding characters in the right string.
  2. The right object must be either a string or an integer. An integer is treated as the decimal ASCII code of a character.
  3. Find the character in the right string assign it to the left string. If an index is larger than the right string length in finding characters, the index will be modulated by the length.

Array Access for User Object

For a user object, the expression

a = user.member; 

Will call the __get primitive function registered for that user type with the user object as the first parameter and a string object of "member" as the second parameter. And

user.member = expr; 

Will call the __set primitive function with the user object as the first parameter, a string object of "member" as the second parameter, and the object resulted from the right expression as the third parameter.

Similarly,

a = user[exr,...];

Will call the __get primitive function with the user object as the first parameter followed by objects resulted from expressions inside []. And

user[expr,...] = expr; 

Will call the __set primitive function with the user object as the first parameter followed by objects resulted from expressions inside [] and the object from the right expression as the last parameter.

And * may be used inside [] to represent null. That is u[i,*] is equivalent to u[i,null]. Refer to the API reference for more details.

Range and Option

A range expression, such as a:b or a:b:c, produces an array of two or three numbers, e.g.,

r1 = 1:10;          // r1 is [1, 10]
r2 = 1:10:2;        // r2 is [1, 10, 2]

When an range expression is used in array creation, it means to fill the array with numbers from the first number to the second. The increment step is1 for the a:b form and is the third number for the a:b:c form. For example,

a = [1:11];        // a is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

And when such an expression is used as index in accessing elements of array, string, and user object, it means to get elements with indices from the first number to the second with increment of the third; and all numbers resulted the range expression must be integers. A null or * may be used to replace the second number to mean the last index. For example,

b = a[0:*:2];      // b is [1, 3, 5, 7, 9, 11]

A range expression may appear as the right operand of an option expression:

a = b > c ? b+1 : b+c;

This is equivalent to

if (b > c)
    a = b + 1;
else
    a = b + c;

The "+" Operator

In addition to its normal outcome when operands are integer or real, the operator is extended to null, string, and array as follows:

  1. When an operand is null, the result is another operand object of any type.
  2. When an operand is a string, the other operand object is converted to string and concatenated with the string; and the result is a string.
  3. When an operand is an array, the operator is applied to each array element according to rule 1 and 2; and the result is an array.

For a class object, the class must define a "__ add" function to accept the operator; and for a user object, its operator function must handle the operator if the operator is expected to work on the user type.

Loop and Control

ZeScript supports three loop structures (while, do, for):

n = 0;
while (n >= 0) {
    n++;
    if (n == 50) continue;        // skip this number
    if (n > 100) break;           // jump out of the loop
    csv(n);
}

/////////////////////////////////////

n = 0;
do {
    n++;
    if (n == 50) continue;        // skip this number
    if (n > 100) break;           // jump out of the loop
    csv(n);
} while (n >= 0);

/////////////////////////////////////

for (i = 0; i < 100; i++) {
    if (i == 10) continue;        // Skip the rest when i=0
    csv(i);
}

And three control/redirect structures (if, switch, goto):

a = 1;
b = 2;
c = 3;

/////////////////////////////////////
if (a > 0) {
    csv(a);
    // more expressions may follow.
}

/////////////////////////////////////

if (a > b) csv(a);        // {} is optional if there is only one expression
else       csv(b);        // after "if" or "else".

/////////////////////////////////////

if (a > b) {
    csv(b);
}
else if (a > c) {
    csv(c);
}
else {
    csv(a);
}

/////////////////////////////////////

switch (flag) {
    case C1:
        csv(C1);
        break;
    case C2:
        csv(C2);
        goto 100;
    default:
        csv("default");
}

case 100:
    ....

The switch argument and the expressions after case and goto must evaluate to integer or string. A case marks where switch starts execution according to the value of switch argument. The default expression is mandatory inside a switch block.

A case may also be used in a function or module for goto redirection. That is the goto expression above will make execution jumping to case 100. Although goto is not recommended in common programming practice, it nevertheless provides convenient redirection in certain situations.

Module and Function

ZeScript code in a file comprises a module. You may define variables and functions in a module. A module may import modules to access functions defined in other modules, e.g.,

////////// hello.zs //////////////////


n = 100;


function hello(a, b, c)
{
    csv(a, b, c, n, "Hello!");
}

////////// main module ///////////////

import hello;

hello(1, 2, 3);    // call function in hello.zs

The import expression or command is like the include macro in C. Because it is processed by ZeScript virtual machine at compile time, it may be placed any where in a script. You may also use the import() function to load a module dynamically. While an imported module by the import command persists throughout the existance of the importer, an imported module by the import function is removed when the function-returned variable is out of scope.

When a module name has any operator characters, the module name must be quoted. For example,

import "hellor.zs";              // import hello.zs
import "my-module/say-hello";    // import say-hello.zs in the my-module subdirectory

Importing looks for script files in the current directory first, and then in subdirectories of lib, cls, or cgi; or other subdirectories included internally in the ZeScript virtual machine.

As shown above, a function is declared by the keyword "function" followed by the function name, arguments in (), and expressions in {}. An argument may have default value. When an argument's default value is not set, null is assumed implicitly. In a function call, parameters are passed to arguments at corresponding positions, but when an assignment expression is used, the object resulted from the right will be set to the argument that has the same name as the left.

function f1(a, b=1, c=2.1, d="hi!", e=[1, 2, 3])
{
    csv(a, b, c, d, e);
}

f1();   // call f1 with no argument
        // output: null, 1, 2.000000, hi!, [0=1, 1=2, 2=3]

f1(a=1, 2, e="Hi!", d=[1, 2, 3]);    // call f1 with positional and assignment arguments
                                     // output: 1, 2, 2.000000, [0=1, 1=2, 2=3], Hi!

Objects of null, boolean, integer, real, and string are passed to functions by value; and objects of array, class, and user are passed by reference.

Calling script function iteratively is allowed:

function Ack(m, n)
{
    if (m == 0) {
        return n + 1;
    }
    if (n == 0) {
        return Ack(m - 1, 1);
    }
    return Ack(m-1, Ack(m, n - 1));		
}


csv(Ack(3, 4));

A function may be declared inside another function. In that case, the inside function may access the private variables of the parent function. For example,

function f(a, b)
{
    a += 10;

    // more expressions here

    ff();    // call function defined internally

    function ff()
    {
        csv(a, b);  // access variable of parent function 
    }
}

Primitive Functions

Primitive functions in a dynamic link library may be loaded to the global primitive function table of ZeScript by any module. For example:

load("my.dll");       // load primitive functions in my.dll

hello();              // suppose hello() is a primitive function in my.dll

Primitive functions precede script functions in function call. That is when a loaded primitive has the same name as a script function, the primitive function will be used.

Object Method

The expression:

object.method(...);

calls the function that belongs to the object.

When object is a class type (more discussions later), a script function named method must be declared in the class.

When object is any other type, a primitive function named method must be registered for that type ( see API reference) and the library containing the function must be loaded. The primitive function will always get the object as the first parameter. Because you can use the same function name for different types of objects, the expression is like calling class methods in C++.

Call Back

ZeScript has implemented two call-back mechanisms: calling script functions from a primitive function (see API reference) and use script function as the argument of another script function shown as follows:

function f1(x, y)
{
	return [x, y];
}

function f2(x, y)
{
	return [x*10, y*100];
}

function dynamic1(f, x, y)
{
 	// call is a buid-in function
  // and f is a pointer to a script function
	csv(call(f, x, y));
}

function dynamic2(name, x, y)
{
 	// func is a buid-in function
  // and name is a script function name
	f = func(name);
	csv(call(f, x, y));
}

f = func("f1");
dynamic1(f, 1, 10);

f = func("f2");
dynamic1(f, 1, 10);

dynamic2("f1", 1, 10);

dynamic2("f2", 1, 10);

Class

A class is a structured object declared at the module level and must be instantiated before being used. A class is like a special module in that class-level variables are shared by inside functions, but protected from functions of other modules or classes. The difference is that each instance of a class has its own variable context while a module has only one variable context.

A very special feature of ZeScript's class is that operator functions may be defined to process such an expression as a + b. For example,

class Point {

    cx = 0;    // initialize class level variables
    cy = 0;
    cz = 0;

    function set(x, y, z)
    {
        ::cx = x; ::cy = y; ::cz = z;
    }

    function add(x, y, z)
    {
        cx += x; cy += y; cz += z;
    }

    function csv()
    {
        csv(cx, cy, cz);
    }

    // this is a operator function for +
    function __add(a, b)
    {
        c = new Point;
        c.cx = a.cx + b.cx;
        c.cy = a.cy + b.cy;
        c.cz = a.cz + b.cz;
        return c;
    }
}

a = new Point;          // create a point
a.set(10, 10, 10);     // call a's function

b = new Point;
b.cx += 5;             // access b's variable directly

c = a + b;              // because a is a class, call a's
                        // operator function for "+".
                        // c is now a class object.
c.csv();

For binary operators, Z-script will calls the operator function of the higher rank in the order of null, boolean, integer, real, string, array, class, and user. In the above example, if a is number and b is a class, b's __add() function will be called with b as the first argument and a as the second. A class may redefine operator functions listed as follows:

__neg(a) <==> -a            __not(a) <==> !a           __cmpl(a) <==> ~a
__incr(a) <==> a++          __decr(a) <==> a--
__mul(a,b) <==> a + b       __add(a,b) <==> a + b
__div(a,b) <==> a - b       __div2(a,b) <==> b - a
__mod(a,b) <==> a % b       __mod2(a,b) <==> b % a
__sub(a,b) <==> a - b       __sub2(a,b) <==> b - a
__le(a,b) <==> a <= b       __lt(a,b) <==> a < b
__ge(a,b) <==> a >= b       __gt(a,b) <==> a > b
__eq(a,b) <==> a == b       __ne(a,b) <==> a != b
__and(a,b) <==> a & b       __or(a,b) <==> a | b        __xor(a,b) <==> a ^ b
__nn(a,b) <==> a && b       __oo(a,b) <==> a || b
__rsh(a,b) <==> a >> b      __lsh(a,b) <==> a << b
__mul_eq(a,b) <==> a *= b   __div_eq(a,b) <==> a /= b   __mod_eq(a,b) <==> a %= b
__add_eq(a,b) <==> a += b   __sub_eq(a,b) <==> a -= b
__rsh_eq(a,b) <==> a >>= b  __lsh_eq(a,b) <==> a <<= b
__and_eq(a,b) <==> a &= b   __or_eq(a,b) <==> a |= b    __xor_eq(a,b) <==> a ^= b

Overwriting Operators

All operators of ZeScript may be redefined to act on any types of user objects. Please refer to the API page and source code for matrix library on how to achieve that.

load("matrix.dll");

A = matrix("double", 10, 10);       // 10x10 double matrix
A.fill(1, 1);                       // A contains numbers from 1 to 100
A *= 10;                            // A contains numbers from 10 to 1000;
A += 1;                             // A contains numbers from 11 to 1001;

Variable Scope

A variable defined in a module is accessible only to expressions, functions, and classes declared in that module; a variable defined in a class is accessible only to expressions and functions declared in that class; and a variable defined in a function is only accessible to expressions and functions declared in that function.

Each module has its own variable context; each instance of class has its own variable context; and each function executes in its own variable context.

When an expression tries to get value from a variable, it starts searching in the function, class, or module that the variable belongs to and then, if failed, starts searching in the ancestors of function or class. But when the :: operator is used, the initial search starts in the owner of a function or class.

When an expression tries to set value to a variable and the variable does not exist in the function, class, or module that the expression belongs to, a new variable will be defined locally. But when the :: operator is used, the expression tries to find the variable in the owner the function or class that defines the variable and set the value to it.

The following example shows the concept of variable scope:

a = 0;
b = 0;

function f()
{
    a = 10;
    ::a = 100;
    csv(a, b, ::a);
    ff();

    function ff()
    {
        a = 1;
        ::b = -1;
        csv(a, ::a, ::b);
    }
}

csv(a, b);
f();
csv(a, b);

Debugging

You can put csv(...) and return anywhere in a script to control execution to that point and check vaiable values. Alternatively, you can use a variable as flag and insert trace(flag) in your code. If flag=true, the trace function will display vaiable values and halt the execution until the enter key is pressed. In evaluating a string as script, you can also set the trace flag to true to show variable values. Here is a script example that evaluate user input interactively:

/*********************************************
 * Interactive program in z-script
 *********************************************/

exec("cls");

csv("ZeScript version 2.3 by Jiye Zeng\n");
csv("Commands:");
csv("    @run    -- execute script");
csv("    @who    -- show variables");
csv("    @clear  -- clear screen");
csv("    @reset  -- clear script and screen");
csv("    @script -- show script");
csv("\nA input not starting with @ is treated as script code.\n");

script = "";

while(1) {
    try {
        s = input("zs>");

        // When control-C is used to interrupt the program,
        // s will not be string.
        // So checking s is necessary.
        if (!isstring(s)) return;

        if (size(s) < 0) continue;

        if (s[0] == "@") {
            s = trim(s);
            switch(s) {
            case "@run":
                // execute script
                ret = eval(script, false);
                if (ret != null) csv(ret);
                break;
            case "@who":
                // execute and show vaiables
                ret = eval(script, true);
                if (ret != null) csv(ret);
                break;
            case "@reset":
                // reset script code and clear screen
                script = "";
                exec("cls");
                break;
            case "@clear":
                // clear screen only
                exec("cls");
                break;
            case "@script":
                // show script code
                csv(script);
                break;
            default:
                csv("invalid command "+s);
                break;
            }
        }
        else {
            script += s;
        }
    }
    catch(msg) {
        csv(msg);
    }
}

try {...} catch (error) {...}

In case of using ZeScript in a server, allowing any ZeScript runtime error to interrupt server service may not be desirable. The try-catch feature of ZeScript may be used to catch and process the error message, e.g.,

try {
    a++;
    ...
}
catch (error) {
    cgi_error(error)
}

Reserved Keywords

addpath   break     case     catch     class     continue
default   do       else      false     for
function  goto     if        import    new
null      return   switch    true      try
while

Tips

Use a++ instead of a = a+ 1, a *= b instead of a = a + b, and so on for efficiency. Try to re-use variable names so that memories allocated for un-used variable will be released immediately.