CE305 Languages and Compilers

Table of Contents

Description

The aim of this module is to introduce the students to formal languages and compilers.

Staff

Lecturer
Dr C Fox
Email
foxcj (non-Essex users should add @essex.ac.uk to create full e-mail address),
Telephone
01206 872770 (General Office)

Schedule of Lectures and Labs

This module is taught during the Autumn Term. There are two hours of lectures and two hours of laboratory sessions. The lab sessions start in the second week of term (Week 3). The schedule of topics given here is provisional.

Lectures

Two one-hour lectures per week. Monday 1pm (5A.118) and 3pm (5A.330)

Labs

One two-hour lab per week\newline Tuesday 2pm–4pm (CSEE Lab 3) starting in Week 3.

Lecture Topics

WeekProvisional Topic
2Background. Tokenisation. Regular Expressions
3Finite State Machines (NFA and DFA). Translations.
4Language Theory. Context Free Grammars. BNF.
5Parsing (top-down). Ambiguity. Parsing (bottom-up).
6Concrete v Abstract Syntax. Static Semantic Analysis. Types. Symbol Tables.
7Scoping. Hierarchical symbol tables. Small Compiler assignment.
8Types. Type checking. Inference with types.
9Code generation.Intermediate languages. Stack machines. Three-address code.
10Local and global optimisations. Liveness analysis.
11Runtime organisation. Activation Records. Calling conventions. Heap. Recursion.

Lab Schedule

WeekProvisional Topic
2
3Start work on Expression Analyser and Forth familiarisation
4Produce regex and CFG rules. Create token streams. Think about error handling.
5Representation of grammar rules and implementation of parsing algorithms
6Demonstrate prototype expression analyser
7Demonstrate final analyser prototypes. Work towards Small Compiler.
8Continue work on small compiler, and understanding of target language.
9You should aim to have assignemnt statements working in the assignment.
10Demonstrate prototype small compiler
11Demonstrate final small compiler

Lab Activities

Initial Lab Session

You should start on building the Expression Analyser for Assignment One. You may wish to refamiliarise yourself with the second CE204 assignment and the initial program code you were given in CE204 and as well as your submitted solution. You should also have a go at running the gForth interpreter, which is the target of both the expression analyser and the compiler project. (The CE305 notes on the gForth interpreter contain some simple exercises to try).

Second Lab Session

Things to try:

  1. Write the regular expressions for tokens of (i) integer arithmetic and (ii) floating point arithmetic, assuming the Forth representation of floating point numbers.
  2. Write context free grammar rules for (i) conventional (infix) integer arithmetic, and (ii) postfix integer arithmetic.
  3. Adapt the program for CE204, or write/adapt your own program, to accept an integer expression (infix) and then print out the string of tokens.
  4. Have a go at adapting the program to accept infix floating point expressions, printing out the string of tokens.
  5. What happens if you enter an expression that uses inappropriate tokens? Try to adapt or extend the program so that sensible error messages are produced.

Note the potential ambiquity for some tokens, such as - and +. Both of these can occur as binary operators (23+32, 32-23), and as (unary) signs, where they are normally considered to be part of the number (e.g. -23 and +32). They may also prefix an expression (as in -(2+3), which could be intepreted as an abbreviation of 0-(2+3)).

There are further ambiguities to consider with floating point numbers. Compare 1e1 + -23e-32 with 1e1 + -23e -32, and with 1e1 + - 23e-32, for example. Note that some of these are syntactically or semantically ill-formed, but it is not up to the tokeniser to deal with such matters.

Note: You may wish to try to simplify the development of your assignment using a "compiler-compiler" tool, such as ANTLR. Now might be a good time to start investigating such tools.

Third Lab

  1. You can continue the work on tokenisation, but also consider the question of how you will build your parser.
  2. The first step is to specify a grammar, which you may already have done.
  3. Note that you can build your parser even if you have not got your tokeniser working. You just have to test your program with manually generated token streams.
  4. If you are using tools, such as ANTLR or COCO/R, then you should work through the tutorials on writing a grammar, and invoking the tools to build the parser.
  5. If you are building on the CE204 code you need to think how you might adapt and extend this so that you can show your token stream, visualise the parse tree, deal with floating point expressions, and give meaningful error messages.
  6. If you are writing the parser from scratch, then you should consider using shift-reduce parsing.
    • This requires you to implement a shift operator, that can push a token on to the stack, and a reduce operation, which checks whether the symbols at the top of the stack match the RHS of a grammar rule, and if so, replaces those items with the LHS of the matching rule. You also need to implement an operation to initialise the parser, and a function or method that can identify that a string of tokens has been accepted.
    • Ideally the primitive operations should be implemented in a way that "encapsulates" the underlying data-structures, so that you can change how you store and access the stack, rules, and (ultimately) parse tree without having to rewrite other parts of the program. This allows you to refine your data-structures at a later date, in the event that you start off with something relatively naive.
    • Calculating tables for LR(k) parsers by hand is error prone. A simpler solution is to implement shift-reduce parsing more directly.
    • There are ways of encoding the grammar rules which make it easy to match "handles" on the stack with the RHS of the rules. One option is to think about the rules in reverse, so conceptually x –> a b c is thought of as (< c, b, a >, x). If you have a c at the top of the stack, then you start matching subsequent items on the stack against the remaining items in what was the RHS of the rule, in the given order.
    • You can start off building a recogniser. Once that is working you can turn it into a parser that constructs the syntax tree.

Fourth Lab

Continue work on the first assignment.

Fifth Lab

  1. You will be expected to demonstrate your program that implements the expression analyser for Assignment One.
  2. You should start work on the Small Compiler for the second assignment. You might want to experiment with variables, conditionals and loops in Forth, as discussed in the previous lecture.

Sixth Lab

You should continue work on the Small Compiler for the second assignment. You might want to experiment with variables, conditionals and loops in Forth, as discussed in the lectures for week 21.

Note that gForth requires that control statements only occur within "compiled" Forth, which means they must occur within the context of a definition.

   <Forth global variable declarations go here>
   : program
   <Body of Forth program code goes here>
   ;
   program 

Seventh Lab

If you are stuck, try to make sure your tokeniser and parser are working with assignment statements. If you cannot get declarations working, then you can test your programs by declaring some default variables automatically in the compiler's output, as shown below.

   variable a
   variable b
   variable c
   : program
   <output of compiler goes here>
   ;
   program 

Once these are working, you can start on if-then conditionals. While-loops should then be fairly easy. If you are stuck on if-then-else then look at the notes on the "dangling else" problem. Then you can go back and sort out variable declaration and "type-checking".

Lecture Notes

  1. PDF of Slides as used in the lectures
  2. Printable PDF of Slides same content as the lecture slides, but formatted for A4 printing
  3. Additional rough hand-written notes on

    These should be read in conjunction with your own notes of the lectures.

Possible revision topics

Tokenisation and parsing
nature, differences, techniques, examples, parse trees, regular expressions, problems (e.g. left recursion) and their solutions.
Automata Theory
relationship between regular expressions and DFA and NFAs, relationship between context free grammars and stack automata.
Symbol Tables and Type Checking
reason, techniques for buildding symbol tables, using symbol tables, using type-checking inference rules.
Runtime
different kinds of run-time architectures, their strengths and weaknesses.
Static Analysis
what is it, examples (e.g. liveness), relevance to compiler implementation.

Assessment

Split: 40 percent Coursework Mark and 60 percent Exam Mark.

Coursework

Forty percent of the mark for the module is based on coursework.

Details to be confirmed

ItemTitleSetDueWeightDeadlineLab Demonstration
1Expression Analyserw4w615%Wednesday, 9th November 2011Prototype: w6, Final: w7
2Small Compilerw8w1125%Wednesday, 14th December 2011Prototype: w11

Examination

Sixty percent of the mark for the module is awarded through a two-hour written examination which is sat during Summer Examination period.

The following is provisional!

  • TBC: The 2012 exam paper consists of four questions.
  • TBC: You should attempt all four questions.
  • The questions are not of equal weight.
  • Percentages in brackets are provided to give an indiciation of the total marks for the paper that will be allocated.

Bibliography

Author: CSEE, University of Essex <foxcj_AT_essex.ac.uk>

Date: Autumn Term 2011/12

HTML generated by org-mode 6.21b in emacs 23