Exercises¶

Chapter 1 - Introduction¶

Explain the terms
- Translator
- Source language – object language
- Compiler
- Interpreter
- Assembler
- Preprocessor/precompiler
Why are translators necessary?
Describe, using an explanatory diagram, the most important parts of a compiler. The description should be based on function and input/output.
What is a pass (for translators)? Why are passes divided up in some cases and which factors affect this separation?
What is a “compiler-compiler” (functionality and input/output) ?

Chapter 2 - A Simple Syntax-Directed Translator¶

Questions on this chapter can be found together with those of the next chapter.

Chapter 3 - Lexical Analysis¶

What is the function of a “scanner”? What type of input/output is there?
Why separate lexical analysis from syntax analysis?
Define “alphabet”, “string”, “string length”, “empty string”, “concatenation of two strings” and “string exponent”.
Define “language”, “concatenation” (“product”) of two languages, “closure” and “positive closure” of a language?
Let $A = \lbrace a, b \rbrace$ and $B = \lbrace 0, 1 \rbrace$ . What are $AB$ , $A^*$ och $A^+$ ?
How do you define a regular expression of a language? Provide some examples.
Define the concepts of
- deterministic finite state automaton
- transition graph – transition table
What is the difference between a deterministic finite-state automaton and a non-deterministic one?
Explain the connection between scanner, regular expressions, finite state automata and transition graph/transition table.
How a lexical analyser is designed and implemented has been covered in the lab assignment on lexical analysis. You must know all the stages of designing a scanner.

Chapter 4 - Syntax Analysis¶

What is a grammar and why has it been introduced? Explain in this context
- BNF-grammar
- rule (production)
- terminal and non-terminal symbols
- vocabulary
- languages which are generated by a grammar
Explain the concepts of
- derivation
- direct derivation
- leftmost and rightmost derivation
- sentential form
- sentence
- parse tree
What is an ambiguous grammar? Why should you not use one of these when describing syntax?
Construct an ambiguous grammar for logical expressions:

Logical expressions consist of operators, parentheses and variables. The logical operators are, in order of priority, (highest first) NOT, AND, OR, EQU (equivalence) and IMP (implication).

Assume that the above priority rules and left associativity apply. Write the corresponding unambiguous context-free grammar. What does the grammar look like if right associativity applies? Also derive the two expressions below using the left associative grammar.
Derive and draw the corresponding parse tree for the unambiguous grammar in question 4 for the strings

a OR b AND NOT c

(x IMP y) EQU z AND NOT u
The first (optional) lab assignment on grammars and formal languages contains several assignments relevant to this chapter.
What is a parser (syntax analyser) ?
Define
- canonical derivation
- left- and right-sentential form
- handle
- canonical reduction sequence
Explain the difference between top-down and bottom-up parsing with respect to the construction of the parse tree and the derivation sequence.
Given the following context-free grammar with the start symbol $S$ :
1 2 3 4
S → a A S | b A → c A s | ε
A top-down parser finds the rules in the order 1 3 4 2 1 4 2.

What was the input sequence? In what order would a bottom-up parser have found (reduced) the rules for the corresponding input?
How does a “shift-reduce” parser work?
A “shift-reduce” parser often uses a stack for parsing. Show how a parse, of the same input and using the same grammar as in question 10 above, works. Show how the stack, input and the action taken change during the analysis.
Explain how a “top-down parser with backup” works by showing how the syntax tree is successively built up and how branches are pruned on failures. Exemplify using the grammar $G(<{onestring}>)$ and the sentence $aeeb\$$ .
Why is left recursion a problem for top-down analysis? Give two different ways to rewrite a grammar so as to eliminate left recursion.
Why should a top-down syntax analyser in a compiler never back up during analysis (4 reasons)?
Explain briefly how a recursive descent parser works.
Write a recursive descent parser for the grammar in question 10 above.
What is an LR parser? How does it work? State some advantages and disadvantages of the method.
What is a “configuration” (in connection with the LR technique)? Show how it changes depending on the parsing action (4 cases).
There are two parse tables in the LR technique. Which are they and what do they contain? What does the parse routine look like?
Show which steps an LR parser goes through for the sentence (id + id)*id$. Use the table on page 252 in the course book (page 219 in the first edition).
Explain and put into the correct context
- viable prefix
- LR(0) item
- canonical LR(0) collection
- augmented grammar
- shift-reduce conflict / reduce-reduce conflict
Given a canonical LR(0) collection, how are the trees for an SLR(1) parser generally constructed?
Write the SLR(1) tables for the grammar
1 2 3
S → 1 S 1 | 0 S 0 | 2
What is the difference between the different classes of LR parsers:

LR(0) – SLR(1) – LALR(1) – LR(1) – … – LR(k) ?
Show another way of representing the parse tables than in matrix form.

Chapter 5 - Syntax-Directed Translation¶

Explain these concepts
- syntax-directed translation schema
- semantic routine — semantic rule
- semantic stack
How do you call semantic rules in top-down and bottom-up syntax analysis?
Write a syntax-directed translation schema which evaluates arithmetic expressions directly. This sort of schema is usually known as an “attributed translation grammar”. Show how such a translation can be implemented in an LR parser environment, with a semantic stack.
Write a syntax-directed translation schema which generates quadruples from while-statements and if-statements. Which attributes are needed?

Chapter 6 - Intermediate-Code Generation¶

Why is the source code translated to an internal form instead of directly generating machine code?
What does the general form for the following representations look like?
- infix
- postfix (reverse Polish notation)
- abstract syntax trees
- quadruples
- triples
Compare and provide some advantages and disadvantages of triples and quadruples.
Given an abstract syntax tree, how is it transformed to reverse Polish notation in a simple way?
Show how expressions in reverse Polish notation can be evaluated (interpreted and values calculated) by using a stack. What is it that you put on the stack?

Translate the following statements to postfix, abstract syntax tree, quadruples and triples:

z := (x+20) * 15 - (y+2)/a

if a < 10 then
   if b < 5 then x := a else x := a + b
else x := a - b

while x < 10 do
begin f := f + 10 * x; x := x -1; end

Chapter 7 - Run-Time Environments¶

Why is there usually a symbol table in a compiler?
Give examples of data which should be found in a symbol table for a language like Pascal or Algol 60.
State two different ways of storing names in the symbol table.
What is the difference in the way “arrays” with a fixed and variable number of dimensions, for example, are stored in the symbol table?
Lists, trees and hash coded tables are three common storage forms for symbol tables. Discuss the advantages and disadvantages of these forms of storage. Is there any one answer as to which method should be used? Why?
Describe a tree-structured symbol table. Show how it is constructed when storing the symbols D G A B E F C in this order in it. You could for instance use a binary tree and sort on the position of the letters in the alphabet.
Describe a hash coded symbol table. Show it graphically by putting some names in the table. Provide examples of how the hash function can be calculated.

Assume we have a hash coded symbol table. Given the program sequence below

    BEGIN
        ADAM, BERTIL : INTEGER;
        BEGIN
            ADAM, CAESAR : REAL;
        END;
        BEGIN
            BERTIL, DAVID : BOOLEAN
L1:             BEGIN
                    CAESAR, ERIK : INTEGER
L2:             END
        END
L3: END

Show what the hash table, symbol table och and the block table will look like at L1, L2 and L3.

Show some different ways of how arrays can be stored during execution (consider both fixed and variable arrays). What is a “dope vector”?
The following excerpt of a program is written in some Algol-like language:
```
begin
   procedure odd(m, n);
   begin
   int i := 0;
     n := m + 1;
     i := m;
   end ;
   (* ... *)
   i := 2;
   odd(i+1, i);
end
```
(All variables are integer types).

Describe what happens when odd is called and what value $i$ has after the call in these cases
- call by reference
- call by value
- call by value/result
- call by name
What is a “thunk”? What is it used for?
What is meant by static memory allocation? What are the advantages? Give examples of languages where such memory allocation is applied.
When do you need dynamic memory allocation?
What is meant by stack allocation? When must this type of allocation be used?

Explain in this context the concepts of
- activation record
- display
Give examples of languages which use stack allocation.
What is meant by heap allocation? When is it needed? Explain these concepts
- fragmentation
- garbage collection
How is memory allocation performed in FORTRAN? What does an activation record contain? How are procedure calls and returns performed?
Describe what is normally contained in an activation record for Algol.
One of the problems with memory assignment in Algol is its pure block structure. Show what happens to an activation record (and the rest of the stack) when entering and leaving a block, especially when there are variable array structures.

Given a procedure ole and in it a declared procedure dole which is called by ole, i.e.

        PROCEDURE ole ...
L1:     BEGIN
            PROCEDURE dole
L2:         END of dole
            ...
            dole
            ...
        END of dole

Describe the appearance of the stack at L1 and L2. How is the new display set up?

Explain how, from the dole procedure in the previous example, you can address declared variables outside ole, in ole and in dole. What function have the displays in the addressing?

Chapter 8 - Code Generation¶

Compilers normally generate one of three types of object code. Which are they? What are the advantages and disadvantages of generating the various types?
What three main problems are their with code generation?
Code generation from tree structures. Describe an algorithm which marks nodes in a tree with their register needs during code generation. Describe an algorithm which generates code from a tree whose nodes are marked with such register needs.
What is meant by “peephole optimization”? Give some examples of code improvements which can be done using this technique.

Chapter 9 - Machine-Independent Optimizations¶

What is meant by
- algorithm optimization
- local optimization
- loop optimization
- global optimization
What is a “basic block”?
Describe briefly and give examples of the following optimization within a “basic block”
- constant folding
- elimination of common sub-expressions
- reduction in strength
Describe briefly the loop optimizations below
- moving loop invariants
- eliminating induction variables
What should you take into consideration before putting optimization routines into a compiler?

Chapter X - Bootstrapping¶

What is a compiler-compiler (functionality and input/output)?
What is meant by cross-compilation?
What is meant by “bootstrapping”?

State a neat and “easy” (it is never easy) way to implement a new language, say Pascal, on a machine $A$ . The compiler should be written in the language itself, here Pascal. Assume that machine $A$ has some high-level language, say FORTRAN, that you should start from.

Show what you should do, with the least possible work, to implement the language above (here Pascal) on another machine $B$ .

Draw figures in connection with the exercises above.