Examples in Program Analysis

U. Kastens
University of Paderborn

Assume you have just been allocated to the quality assurance group of the software company you are working in. The company has specific style guide for writing C programs. It will be your task to check whether the programs developed in the company obey to that guide. You start your work by constructing a tool that analyses C programs for certain properties required by the style guide. In order to get started you use a simplified grammar for a C subset, and pick 4 requirements from different chapters of the style guide.

The grammar is given in the last section of this document.

1 Dangling Else
1.1 Solution
1.2 Further Questions
1.3 Output files
2 Switch Case Labels
2.1 Solution
2.2 Further Questions
2.3 Output files
3 Loop Enumeration
3.1 Solution
3.2 Further Questions
3.3 Output files
4 Variable Usage
4.1 Solution
4.2 Further Questions
4.3 Output files
5 Structuring
5.1 Specification
5.2 Output of files

1 Dangling Else

The else part of if statements is optional. Nested if statements like the following are errorprone:
dangling.in[1]==
{   
  int i, z;
  i = read();
  if (i>=0)
  if (i<10) z = 5+i;
  else      z = 0;
  print (z);
}
This macro is attached to a product file.
C defines that the else belongs to the inner if (i<10). The programmer might have intended to associate it to the outer if (i>=0). Hence the style guide requires to avoid such situations, e.g. by additional curly braces. Your analysis tool shall emit a warning if such a dangling else occurs:
dangling.out[2]==
"dangling.in", line 5:3 WARNING: beware of dangling else!
This macro is attached to a product file.
The following questions may help to focus your attention on the significant aspects of this problem:

1. Which are the adjacent tree contexts that will cause the warning?

2. In which context is the warning issued, and what is its precondition?

1.1 Solution

Fill in the description of your solution here.
dangling else computations[3]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 4.

1.2 Further Questions

1. Did you get messages on missing computations? If so, explain why.

2. Reconsider your solution when you have learned about SYMBOL computations. How can you use them to simplify your solution?

1.3 Output files

dangling.lido[4]==
dangling else computations[3]
This macro is attached to a product file.

2 Switch Case Labels

Switch statements sometimes span over a wide range of case label values, and only a few case labels are stated for the non-default cases. A poor C compiler might then produce a large jump table with only a few non-default entries. Hence the style guide requires not to waste storage by sparse switch cases.

You decide to suggest to your manager that this requirement should be removed, because today's C compiler automatically translate sparse switch cases into cascades of conditional jumps.

In order to strengthen your argument by figures, your analysis tool will compute the number of the case labels in a switch statement and the range of their values.

For a program like

switch.in[5]==
{ int i, z;
  i = read();
  switch (i)
  {
   case 0:   z = 1;
             break;
   case 99:
   case 100: z = 2;
             break;
   default:  z = 0;
  }
  print (z);
}
This macro is attached to a product file.
the tool will produce the output:
switch.out[6]==
switch in line 3 has 3 case labels in the range 0 to 100
This macro is attached to a product file.
The following questions may help to focus your attention on the significant aspects of this problem:

1. How do you access the case label values from the switch statement context?

2. Which functions do you need to combine the values?

3. How do you handle nested switch statements?

2.1 Solution

Fill in the description of your solution here.
Switch case label computation[7]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 9.
Cpp macro definitions[8]==

/* Fill in your cpp macros here. */

This macro is invoked in definition 10.

2.2 Further Questions

1. Does your solution work for nested switch statements? Explain why.

2. Explain why the f0 functions are needed (look at the grammar).

3. Can you make any assumption about the order in which the functions are called?

4. Reconsider your solution when you have learned about SYMBOL computations. How can you use them to simplify your solution?

2.3 Output files

switch.lido[9]==
Switch case label computation[7]
This macro is attached to a product file.
switch.head[10]==
Cpp macro definitions[8]
This macro is attached to a product file.

3 Loop Enumeration

The program analyzer shall derive information about for loops in the program: The loops are counted from left to right throughout the program. For each loop its number, its nesting depth, and its line number is to be printed.

For example the following program

loops.in[11]==
{
  int i, k, j, s;
  s = 0;
  for (i=1; i<10; i=i+1)
    for (j=i; j<10; j=j+1)
      s = s + j;
  for (k=0; k<5; k=k+2)
    s = s - k;
  print (s);
}
This macro is attached to a product file.
shall produce output like
loops.out[12]==
for loop number 1 in line 4 has depth 1
for loop number 2 in line 5 has depth 2
for loop number 3 in line 7 has depth 1
This macro is attached to a product file.
The following questions may help to focus your attention on the significant aspects of this problem:

What is the computational pattern for left-to-right enumeration? In which context and with which value do you start the enumeration?

What is the computational pattern for nesting depth? How is the recursive nesting being founded?

3.1 Solution

Decompose your solution into 3 parts: Loop enumeration, loop nesting depth, and printing the information.

Fill in the description of your loop enumeration solution here.

Loop enumeration[13]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 16.
Fill in the description of your loop nesting depth solution here.
Loop nesting depth[14]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 16.
Fill in the description of your printing solution here.
Loop information[15]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 16.

3.2 Further Questions

Reconsider your solution when you have learned about SYMBOL computations. Which of the computations can be turned into SYMBOL computations? Explain why.

3.3 Output files

loops.lido[16]==
Loop enumeration[13]
Loop nesting depth[14]
Loop information[15]
This macro is attached to a product file.

4 Variable Usage

The style guide requires that a program must not have variables which are never used. Furthermore programs must not rely upon default initialisation. Hence each variable must be assigned to before it is used.

You decide to suggest to your manager that the programmers should use lint before they release their programs. Inspite of that you try to let your analysis tool check these requirements, and compare the results with those of lint.

For the program

vars.in[17]==
{ int a, b, c, d, e;
  int f;
  int a;

  a = read (x);
  b = a + c;
  c = a * c;
  e = d + a;
  print (e);
}
This macro is attached to a product file.
your analysis tool shall produce messages like
vars.out[18]==
"vars.in", line 5:13 ERROR: variable is not defined
"vars.in", line 1:7 ERROR: variable is multiply defined
"vars.in", line 2:7 WARNING: variable is never used
"vars.in", line 3:7 ERROR: variable is multiply defined
"vars.in", line 5:13 WARNING: variable is used before set
"vars.in", line 6:11 WARNING: variable is used before set
"vars.in", line 7:11 WARNING: variable is used before set
"vars.in", line 8:7 WARNING: variable is used before set
This macro is attached to a product file.
The following questions may help to focus your attention on the significant aspects of this problem:

1. Decompose the whole task into subproblems: a name analysis task and several single tasks which need to compute a property of variables each. Solve the tasks separately.

2. How do you solve the name analysis task?

3. For each property subtask answer the following questions: What are the property values? In which contexts is the property to be set? In which contexts is the property to be read? What is the sequencing relation between setting and reading the property.

4. Discuss the write before read subproblem for the statement c = a * c;.

4.1 Solution

Name Analysis:

Fill in the description of your solution here.

Name analysis module instance[19]==
/* Fill in your module instantiation here. */
This macro is invoked in definition 29.
Name analysis module roles[20]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 30.
Check Variable Definitions:

Fill in the description of your solution here.

Undefined variables[21]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 30.
Multiple definitions property[22]==

/* Fill in your PDL specification here. */

This macro is invoked in definition 31.
Definition state macros[23]==

/* Fill in cpp macro definitions here */

This macro is invoked in definition 32.
Multiple definitions[24]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 30.
Used Variables:

Fill in the description of your solution here.

Used property[25]==

/* Fill in your PDL specification here. */

This macro is invoked in definition 31.
Uses of variables[26]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 30.
Read before Write:

Fill in the description of your solution here.

Is written property[27]==

/* Fill in your PDL specification here. */

This macro is invoked in definition 31.
Variable access computation[28]==

/* Fill in your Lido specification here. */

This macro is invoked in definition 30.

4.2 Further Questions

1. Do you know a library module role that supports the check for undefind variables?

2. Do you know a library module that supports the check for multiple definitions?

3. How did you solve the problem that a variable use in the right-hand side of an assignment has to be considered before the left-hand side? Do you see alternative solutions?

4.3 Output files

vars.specs[29]==
Name analysis module instance[19]
This macro is attached to a product file.
vars.lido[30]==
Name analysis module roles[20]
Undefined variables[21]
Multiple definitions[24]
Uses of variables[26]
Variable access computation[28]
This macro is attached to a product file.
vars.pdl[31]==
Multiple definitions property[22]
Used property[25]
Is written property[27]
This macro is attached to a product file.
vars.head[32]==
Definition state macros[23]
This macro is attached to a product file.

5 Structuring

5.1 Specification

The structure of the example language is specified by the following concrete grammar productions.

A program is a block that consists of possibly empty sequences of declarations and statements.

Program[33]==
Prog    ::= Block .
Block   ::= '{' Decl* Stmt* '}' .
This macro is invoked in definition 44.
A declaration defines one or more variables of type int. Each declaration is terminated by a ';'. The symbol VarDecl distinguishes this defining occurrence of Ident from other occurrences. Several identifiers are separated by ','.
Declarations[34]==
Decl    ::= 'int' VarDecls ';' .
VarDecls::= VarDecl // ',' .
VarDecl ::= Ident .
This macro is invoked in definition 44.
There are several forms of statements: assignments, conditionals, switch statements, break statements, for loops, blocks, and expression statements. Statements, except blocks and switch statements are terminated by ';'.
Statements[35]==
Stmt    ::= Block 
          / Expr ';'
          / 'break' ';'
          / Assign ';'
          / ForStmt .
This macro is invoked in definition 44.
An assignment has the usual notation. The Assign symbol is introduced, because for loops contain assignments without terminating ';'.
Assignments[36]==
Assign  ::= VarUse '=' Expr .
VarUse  ::= Ident .
This macro is invoked in definition 44.
for statements consist of the initial assignment, the iteration condition, the assignment executed after each iteration, and the iterated statement.
For loop[37]==
ForStmt ::= 'for' '(' Assign ';' Expr ';' Assign ')' Stmt .
This macro is invoked in definition 44.
The else branch of an if statement is optional. The dangling else ambiguity is resolved such that an else branch is bound to the innermost if statement. The $'else' modification instructs the parser generator not to reduce that production if an else token follows.
If statement[38]==
Stmt    ::= 'if' '(' Expr ')' Stmt $'else' .
Stmt    ::= 'if' '(' Expr ')' Stmt 'else' Stmt .
This macro is invoked in definition 44.
Switch statements have the same structure as in C.
Switch statement[39]==
Stmt    ::= SwStmt .
SwStmt  ::= 'switch' '(' Expr ')' '{' Case* '}' .
Case    ::= CaseLab Stmt* .
CaseLab ::= 'case' CaseLit ':' .
CaseLab ::= 'default' ':' .
CaseLit ::= Number .
This macro is invoked in definition 44.
The following expression syntax specifies precedence and associativity of operators: MulOpr have highest precedence, CmpOpr have lowest precedence. AddOpr and MulOpr are left-associative, and CmpOpr are not associative.
Expressions[40]==
Expr    ::= Sum CmpOpr Sum / Sum .
Sum     ::= Sum AddOpr Fact / Fact .
Fact    ::= Fact MulOpr Operand / Operand .
CmpOpr  ::= '<' / '>' / '==' / '!=' / '<=' / '>=' .
AddOpr  ::= '+' / '-' .
MulOpr  ::= '*' / '/' .
Operand ::= '(' Expr ')' .
This macro is invoked in definition 44.
The precedence and associativity rules are only relevant for the construction of the tree. The then constructed tree need not distinguish between Expr, Sum, Fact, and Operand. They are all represented by an Expr node. Similarly all operators are represented by an Opr node in the tree:
Equivalent nonterminals[41]==
Expr    ::= Sum Fact Operand .
Opr     ::= CmpOpr AddOpr MulOpr .
This macro is invoked in definition 45.
Operands are integral numbers, identifiers that denote access of a variable value, or calls of predefined functions. The latter may have an arbitrary number of arguments.
Operands[42]==
Operand ::= Number .
Operand ::= VarUse .
Operand ::= Ident '(' Params ')' .
Params  ::= .
Params  ::= Expr // ',' .
This macro is invoked in definition 44.
The notation for numbers, identifiers, and comments are specified as in C using canned token specifications:
Tokens[43]==
Number: C_INTEGER
Ident:  C_IDENTIFIER
        C_COMMENT
This macro is invoked in definition 46.

5.2 Output of files

tree.con[44]==
Program[33]
Declarations[34]
Statements[35]
Assignments[36]
For loop[37]
If statement[38]
Switch statement[39]
Expressions[40]
Operands[42]
This macro is attached to a product file.
tree.sym[45]==
Equivalent nonterminals[41]
This macro is attached to a product file.
tree.gla[46]==
Tokens[43]
This macro is attached to a product file.