[MonkeyScript] [TitleIndex] [WordIndex

This document defines how a MonkeyScript interpreter will handle character encodings within script files.

Preemptive notes

Inside of MonkeyScript the interpreter first executes a core monkeyscript.js file which handles the actual execution of program's script files. This file is guaranteed to be pure US-ASCII, and not contain any invalid JS at the start of the file.

SpiderMonkey as a slight bug in it. If you save the following file as UTF-8 and try to execute it:

// These two are the same character
print("♥" === "\u2665");
print("♥".length);
print("\u2665".length);

Then SpiderMonkey (by default) will print false, 3, and 1 (since the heart is a 3byte UTF-8 character). While other engines (Rhino, V8, and JavaScriptCore) will all print true, 1, and 1.

For the MonkeyScript environment this should be fixed so that strings read from files are properly read according to the file's encoding and translated into proper UTF-16.

Interpreting source code

MonkeyScript uses python's pep263 as a reference as well as suggestions of UTF-8 defaulting made by ServerJS members and due to the wide de-facto standard of it's use by default in other things such as XML.

A new type of error EncodingError is defined. monkeyscript.js should die and print a proper message when it catches one thrown from it's core eval(); this also gives programs an option to handle bad script files without just killing the program.

When executing a script file using exec() the interpreter should make a number of checks:

This definition takes many factors and possibilities into account and gives flexibility in how things are handled.

# -*- coding: ISO-8859-1 -*-

# vim: set fileencoding=ISO-8859-1 :

# coding=ISO-8859-1

#!/usr/bin/monkeyscript
# -*- coding: ISO-8859-1 -*-

// -*- coding: ISO-8859-1 -*-

References


2010-07-23 04:25