This assignment requires you to build a compiler for a simple
while-language which compiles programs into a form that can be
evaluated by a stack-based interpreter, together with documentation.
You are also required to produce and present details of the formal specification of the compiler, including the regular expressions for the tokeniser and BNF grammar that is implemented by the parse. Ideally the parser should have options to display the internal representation of parsed expressions in a easily readable form (e.g. using graphviz or similar). The final program should produce sensible error messages if given ill-formed expressions. Well-formed expressions should be compiled into Forth code, which should be demonstrated to run correctly on the gForth interpreter.
As part of your report on the assignment, you are expected to provide a specification of the languages and translations that are involved in your expression compiler. These should include:
If you extend the system to deal with other features, such as floating point values, then these should also be described.
Formulate regular expressions that define what tokens are recognised. These should be structured definitions. The top level should be the regular expression that matches any token. The next level should be the names of the main token types.
You should state the syntax of the programs that your system recognises (the source expressions) using BNF or EBNF notation. The formulation should be expressed in terms of the tokens you have defined.
Provide a formalisation of the target language in (E)BNF.
Give a declarative description of how the source language is to be translated in the target language.
This should be a recursive definition stated mathematically, or in clear English.
Your software should include distinct parts for the tokeniser, parser,
and virtual-machine code generation. You should demonstrate the
compilation of while-programs into the Forth language, and the
execution of the generated code on the gForth interpreter. The implementation should be described in the report.
For the implementation, you may wish to start with the first CE305 assignment as a starting point, and then
You can also try using tools to generate the lexer and parser, such as ANTLR, JavaCC, SableCC and Coco/R. In the latter case, you will be expected to select and learn about the tools by yourself. Your report should then contain a description of how the tools were applied.
Hand-coded systems will be assessed for coding quality. Systems produced using tools will be assessed for the effectiveness with which those tools were used.
Some marks will be awarded for compiler features such as helpful and informative errors messages, which trap errors that might otherwise cause run-time errors, and support for floating-point expressions as well as integers.
If you are feeling adventurous, you may wish to attempt to implement functions and function calls. Function calls will have to push their arguments on to the stack. Function definitions will then use those arguments (either directly, or through Forth's local variable naming). Your type checking would need to be modified to avoid run-time errors if there are parameter mismatches between function calls and function definitions.
Program code will be assessed for its modularity. A clear separation of the different aspects of the system (tokenisation, parsing, type-checking, and Forth generation) is expected.
java -jar MyCompiler
or
java CE305.MyCompiler
The electronic submission should include the report — containing the specification and description of the implementation — and the implementation itself — both source files and compiled program. Instructions for running the program as submitted, should be included.
Ideally the report should be submitted as a properly formatted PDF
file. Instructions for how to start the program from the command line
should be given in a readme.txt file.
Please include a sample program that your compiler can compile.
This should demonstrate the key features that are supported. (For
example, if you language supports IF THEN ELSE, and WHILE,
with complex Booleans, as well as "floats" and "ints", these should
feature in your sample program.)
Remember:
java -jar MyCompiler
or
java CE305.MyCompiler
Please also include this information in a readme.txt file with your submission.
Make absolutely certain that the program as submitted can be run from the command-line as described. Try to ensure your program runs with JRE version 6 (or 1.6).
Do not forget to include your name and registration number in your report and in your submitted source files.
You will be expected to demonstrate your prototype program(s) in the lab session preceeding the submission (Tuesday, 13th December, CSEE Lab 3, 12–2pm).
Note: If the functionality of the submitted version differs significantly from the demonstrated prototype, this should be mentioned in the submitted report.
| Scheme provided for indicative guidance only | |
|---|---|
| Specification (40%) | |
| Definition of Tokens | 10 |
| Grammar of Expressions | 15 |
| Description of Translation | 15 |
| Implementation (50%) | |
| Basic operation | 20 |
| Modularity | 10 |
| Coding quality and/or use of tools | 10 |
| Extended features | 10 |
| Documentation (10%) | |
| Clarity of documentation | 10% |
THIS IS SUBJECT TO REVISION
Date: Spring Term 2010/11
HTML generated by org-mode 6.21b in emacs 23