CI-97
6. Synthesis: An Overview

Source program  Lexical analysis

Token sequence
Syntactic analysis
Abstract program tree
Semantic analysis          program analysis

Analysis (frontend)    Transformation          optimizing transformation

Intermediate language          Optimization (DFA)
Synthesis (backend)          code & storage mapping
Code generation          code selection

Abstract          Peephole optimization
machine program  Assembly          register allocation

Target program          automatic parallelization

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 97

Objectives:
Relate synthesis topics to compiler structure

In the lecture:
* This chapter addresses only a selection of synthesis topics.
* Only a rough idea is given for each topic.
* The topics are treated completely in the lecture "Compiler II".

--------------------------------------------------------------------------------


CI-98
Optimization

Objective: Reduce run-time and/or code size of the program, without changing its effect.
Eliminate redundant computations, simplify computations.

Input:        Program in intermediate language
Task:        Analysis (find redundancies), apply transformations
Output:      Improved program in intermediate language

Program analysis:
static properties of program structure and execution
safe, pessimistic assumptions where input and dynamic execution paths are not known

Context of analysis:
Expression          local optimization
Basic block          local optimization
Control flow graph (procedure)     global   intra-procedural optimization
Control flow graph, call graph      global   inter-procedural optimization

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 98

Objectives:
Overview over optimization

In the lecture:
* Program analysis computes safe assumptions at compile time about execution of the program.
* The larger the analysis context, the better the information.
* Conventionally this phase is called "Optimization", although in most cases a formal optimum can not be defined or
achieved with practical effort.

Suggested reading:
Kastens / Übersetzerbau, Section 8

--------------------------------------------------------------------------------


CI-99
Optimizing Transformations

Name of transformation:          Example for its application:

* Algebraic simplification of expressions          2*3.14  x+0   x*2  x**2
* Constant propagation (dt. Konstantenweitergabe)          x = 2; ... y = x * 5;
* Common subexpressions (Gemeinsame Teilausdrücke)  x=a*(b+c);...y=(b+c)/2;
* Dead variables (Überflüssige Zuweisungen)          x = a + b; ... x = 5;
* Copy propagation (Überflüssige Kopieranweisungen)         x = y; ... ; z = x;
* Dead code (nicht erreichbarer Code)   b = true;...if (b) x = 5; else y = 7;
* Code motion (Code-Verschiebung)     if (c) x = (a+b)*2; else x = (a+b)/2;
* Function inlining (Einsetzen von Aufrufen)  int Sqr (int i) { return i * i; }
* Loop invariant code          while (b) {... x = 5; ...}
* Induction variables in loops   i = 1; while (b) { k = i*3; f(k); i = i+1;}

Analysis checks preconditions for safe application of each transformation;
more applications, if preconditions are analysed in larger contexts.

Interdependences:
Application of a transformation may enable or inhibit another application of a transformation.

Order of transformations is relevant.

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 99

Objectives:
Get an idea of important transformations

In the lecture:
* Some transformations are explained.
* The preconditions are discussed for some of them.

Suggested reading:
Kastens / Übersetzerbau, Section 8.1

Assignments:
* Apply some transformations in a given example program.

Questions:
* Which of the transformations need to analyze pathes through the program?
* Give an example for a pair of transformations, such that an application of the first one enables an application of the
second.

--------------------------------------------------------------------------------


CI-100
Analysis in Compilers

Source program          syntactic structure
Lexical analysis          program entities

Token sequence          properties
Syntactic analysis          relations

Abstract program tree          control-flow graph
Semantic analysis
data-flow information
Analysis (frontend)    Transformation
use-def relations
Intermediate language          Optimization (DFA)
Synthesis (backend)          data dependency graph

Code generation          dominator tree, loops

Abstract          Peephole optimization
machine program  Assembly          call graph

Target program

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 100

Objectives:
See some methods of program analysis

In the lecture:
Give brief explanations of the methods

--------------------------------------------------------------------------------


CI-101
Example for a Control-flow Graph

Intermediate code with basic blocks:          Control-flow graph:
[Muchnick, p. 172]

1    receive m
2    f0 <- 0          entry
3    f1 <- 1          B1
4    if m <= 1 goto L3          B1

5    i <- 2          B3

6 L1: if i <= m goto L2  B4
B2
7    return f2          B3
B5

8 L2: f2 <- f0 + f1
9    f0 <- f1          B4

10    f1 <- f2          B6
11    i <- i + 1
12    goto L1
B5        B6
13 L3: return m          B2

exit

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 101

Objectives:
Example for a control-flow graph

In the lecture:
* The control-flow graph represents the basic blocks and their branches.
* See Lecture "Modellierung", Mod-4.27 ("Programmablaufgraphen")

--------------------------------------------------------------------------------


CI-102
Data-Flow Analysis

Data-flow analysis (DFA) provides information about how the execution of a program may
manipulate its data.

Many different problems can be formulated as data-flow problems, for example:
* Which assignments to variable v may influence a use of v at a certain program position?
* Is a variable v used on any path from a program position p to the exit node?

* The values of which expressions are available at program position p?

Data-flow problems are stated in terms of
* paths through the control-flow graph and

* properties of basic blocks.

Data-flow analysis provides information for global optimization.

Data-flow analysis does not know
* input values provided at run-time,
* branches taken at run-time.

Its results are to be interpreted pessimistic.

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 102

Objectives:
Goals and ability of data-flow analysis

In the lecture:
* The topics on the slide are explained.
* Examples for the use of DFA information are given.
* Examples for pessimistic information are given.

Suggested reading:
Kastens / Übersetzerbau, Section 8.2.4

Questions:
* What's wrong about optimistic information?
* Why can pessimistic information be useful?

--------------------------------------------------------------------------------


CI-103
Specification of a DFA Problem

Specification of reaching definitions:

* Description:
A definiton d of a variable v reaches the begin of a block B
if there is a path from d to B on which v is not assigned again.

* It is a forward problem.

* The meet operator is union.          2 equations for each basic block:

* The analysis information in the sets are      Out (B) = Gen (B)         (In (B) - Kill (B))union
assignments at certain program positions.        In   (B)  =                      Out (h)
    Theta
h element pred(B)
* Gen (B):
contains all definitions d: v = e; in B,
such that v is not defined after d in B.          B

* Kill (B):          .          .
if v is assigned in B, then Kill(B)          pred (B) . (In - Kill) union Gen = Out     .
contains all definitions d: v = e;          .          .
in blocks different from B,
such that B has a definition of v.

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 103

Objectives:
Get an idea of DFA problems

In the lecture:
Explain how DFA problems are specified by a set of equations.

--------------------------------------------------------------------------------


CI-104
Call Graphs for object-oriented programs

The call graph is reduced to a set of reachable methods using the
class hierarchy and the static type of the receiver expression in the call:

If a method F-p is reachable and
if it contains a dynamically bound call v.m(...) and
T is the static type of v,

then every method m that is inherited by T or by a subtype of T
is also reachable, and arcs go from F-p to them.

class A          Call graph for F-p containing v.m(...)
method m
method p          static type: F v;
A-m    A-p

class B          class C
method m          method m          B-m          C-m

class D          class E          class F
...          method m         method p          F-p
E-m

class G          ...
method m          eliminated       G-m

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 104

Objectives:
See a typical object-oriented analysis

In the lecture:
* Dynamically bound method calls contribute significantly to the cost of object-oriented programs.
* Static resolution as far as possible is very effective.

--------------------------------------------------------------------------------


CI-105
Code Generation

Input: Program in intermediate language

Tasks:
Storage mapping   properties of program objects (size, address) in the definition module
Code selection     generate instruction sequence, optimizing selection
Register allocation   use of registers for intermediate results and for variables
Output: abstract machine program, stored in a data structure

Design of code generation:
* analyze properties of the target processor
* plan storage mapping

* design at least one instruction sequence for each operation of the intermediate language

Implementation of code generation:

* Storage mapping:
a traversal through the program and the definition module computes
sizes and addresses of storage objects

* Code selection: use a generator for pattern matching in trees

* Register allocation:
methods for expression trees, basic blocks, and for CFGs

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 105

Objectives:
Overview on design and implementation

In the lecture:
* Identify the 3 main tasks.
* Emphasize the role of design.

Suggested reading:
Kastens / Übersetzerbau, Section 7

--------------------------------------------------------------------------------


CI-106
Storage Mapping

Objective:
for each storable program object compute storage class, relative address, size
Implementation:
use properties in the definition module, travers defined program objects

Design the use of storage areas:
code storage    progam code

global data       to be linked for all compilation units

run-time stack    activation records for function calls

heap          storage for dynamically allocated objects, garbage collection

registers for      addressing of storage areas (e. g. stack pointer)
function results, arguments
local variables, intermediate results (register allocation)

Design the type mapping ... C-29

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 106

Objectives:
Design the mapping of the program state onto the machine state

In the lecture:
Explain storage classes and their use

Suggested reading:
Kastens / Übersetzerbau, Section 7.2

--------------------------------------------------------------------------------


CI-107
Run-Time Stack

Run-time stack contains one activation record for each active function call.
Activation record provides storage local data of a function call. (see C-31)

Nested functions (nested classes and objects): static predecessor chain
links the accessible activation records, closure of a function

h float a;          h a:          static

nested        q  int i;          q:          links

functions

r          q i:
r:
int b;

b=i+1;          q i:
r:

if(..) q();         q i:
r();          r:
b:
q();          r
push, pop     b=i+1;

Requirement: The closure of a function is still on the run-time stack when the function is called.
Languages without recursive functions (FORTRAN) do not use a run-time stack.
Optimization: activation records of non-recursive functions may be allocated statically.

Parallel processes, threads, coroutines need a separate run-time stack each.

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 107

Objectives:
Understand the concept of run-time stacks

In the lecture:
The topics on the slide are explained. Examples are given.

* Explain static and dynamic links.
* Explain nesting and closures.
* Different language restrictions to ensure that necessary closures are on the run-time stack.

Questions:
* How do C, Pascal, and Modula-2 obey the requirement on stack discipline?
* Why do threads need a separate run-time stack?

--------------------------------------------------------------------------------


CI-108
Code Sequences for Control Statements

A code sequence defines how a control statement is transformed into jumps and labels.

Several variants of code sequences may be defined for one statement.

Example:

while (Condition) Body      M1:  Code (Condition, false, M2)
Code (Body)
goto M1
M2:

variant:

goto M2
M1:  Code (Body)
M2:  Code (Condition, true, M1)

Meaning of the Code constructs:

Code (S):          generate code for statements S

Code (C, true, M)    generate code for condition C such that
it branches to M if C is true,
otherwise control continues without branching

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 108

Objectives:
Concept of code sequences for control structures

In the lecture:
* Explain the code sequence for while statements.
* Explain the transformation of conditions.
* Discuss the two variants.
* Develop a code sequence for for statements.

Questions:
* What are the advantages of each alternative?
* Give a code sequence for do-while statements.

--------------------------------------------------------------------------------


CI-109
Example for Code Selection

tree for assignment    ... = a[i].s;

   R5
cont          (R2,18)
cont
load  R4,12          R2,18

addradd          addradd

R2,12     add      R3          R2,12          6
addradd        constload          addradd       const
6          6
add
R6,12   add      R1          R6,12          R1

addr          cont          addr         cont

R6,12         load          R6,12       load

addrR6,8          R6,8
addr
R6,8
R6,8

load (R6,8), R1          load (R6,8), R1
add R6,R1,R2          add R6,R1,R2
load 6,R3          store (R2,18),...
add R2,R3,R4          cost: 3 instructions
load (R4,12),R5
store R5, ...

cost: 6 instructions

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 109

Objectives:
Get an idea of code selection by tree patterns

In the lecture:
* Show application of patterns.
* Explain code costs.

--------------------------------------------------------------------------------


CI-110
Register Allocation

Use of registers:

intermediate results of expression evaluation
reused results of expression evaluation (CSE)
contents of frequently used variables
parameters of functions, function result (cf. register windowing)
stack pointer, frame pointer, heap pointer, ...
Number of registers is limited - for each register class: address, integer, floting point

register allocation aims at ruduction of

* number of memory accesses
* spill code, i. e. instructions that store and reload the contents of registers

specific allocation methods for different context ranges:

* expression trees (Sethi, Ullman)
* basic blocks (Belady)
* control flow graphs (graph coloring)

useful technique: defer register allocation until a later phase,
use an unbound set of symbolic registers instead

(c) 2002 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 110

Objectives:
Overview on register allocation

In the lecture:
Explain the use of registers for different purposes.

Suggested reading:
Kastens / Übersetzerbau, Section 7.5

--------------------------------------------------------------------------------


CI-111
Example for Graph Coloring

CFG with definitions and uses of variables          interference graph

a :=          d2                    d1                       d3
B1     c :=          f                        a                         d

f :=

a          c                       b                          e
B2   a          b :=     B3
b :=          d3                    d2                        d1
c

d :=          d :=
B4   a          e :=     B5
b

B6    b
d

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 111

Objectives:
Get an idea of register allocation by graph coloring

In the lecture:
* Explain the example.
* Refer to lecture "Modellierung" Mod-4.21

Suggested reading:
Kastens / Übersetzerbau, Section 7.5.4, Fig. 7.5-6

Assignments:
* Apply the technique for another example.

Questions:
* Why is variable b in block B5 alive?

--------------------------------------------------------------------------------


CI-112
Code Parallelization

Target processor executes several instructions in parallel.
Compiler arranges instruction sequence for shortest execution time: instruction scheduling

Principles of parallelism in processors:
Data parallel processor

Parallel functional units (FU)          vector processor
super scalar, VLIW:          all FUs execute the same instruction

FU1 FU2  FU3          on individual data (SIMD)

FU0        ...      FU31

parallelized
instruction          for i := 0 to 31
sequence          do c[i]  := a[i] + b [i];
is one instruction!
Analyze and transform loops

Pipeline processor
S3  S2  S1
sequential code scheduled for pipelining

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 112

Objectives:
3 abstractions of processor parallism

In the lecture:
* Explain the abstract models,
* relate them to real processors,
* explain the instruction scheduling tasks.

Suggested reading:
Kastens / Übersetzerbau, Section 8.5

Questions:
* Whathas tobeknownabout instructionexecution inorder to solvethe instructionscheduling problem in the compiler?

--------------------------------------------------------------------------------

CI-113
Software Pipelining

Technique for parallelization of loops.

A single loop body does not exhibit enough parallelism => sparse schedule.

Idea of software pipelining:
transformed loop body executes several loop iterations in parallel,
iterations are shifted in time => compact schedule

Prologue, epilogue: initiation and finalization code

Technique:

1. DDG for loop body          without        with software pipelining
with dependencies into          II          prologue
later iterations          II

2. Find a schedule such that          transformed
II       loop
iterations can begin with
a short initiation interval II          epilogue

3. Construct new loop,
prologue, and epilogue          II: Initiation Interval

(c) 2006 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 113

Objectives:
Increase parallelism in loops

In the lecture:
* Explain the underlying idea

Questions:
Explain:

* The shorter the initiation interval is, the greater is the parallelism, and the compacter is the schedule.
* The transformed loop contains each instruction of the loop body exactly once.

--------------------------------------------------------------------------------


CI-114

Loop Parallelization

Compilation steps:          DECLARE B[0..N,0..N+1]

* nested loops operating on arrays,          FOR I := 1 ..N
sequentiell execution of iteration space          FOR J := 1 .. I
B[I,J] :=
B[I-1,J]+B[I-1,J-1]
END FOR
END FOR

* analyze data dependencies
data-flow: definition and use of array elements

* transform loops
keep data dependencies intact
* parallelize inner loop(s)
map onto field or vector of processors

* map arrays onto processors
such that many acceses are local,
transform index spaces

(c) 2006 bei Prof. Dr. Uwe Kastens

j          j
1          N

i
j

N

N          -1

1-N

1
1

i
1          N       i

-1

-N

N
1

--------------------------------------------------------------------------------


Lecture Compiler I WS 2001/2002 / Slide 114

Objectives:
Overview on regular loop parallelization

In the lecture:
Explain

* Application area: scientific computations,
* goals: execute inner loops in parallel with efficient data access,
* transformation steps.

--------------------------------------------------------------------------------