Compiler and Static Analyzer

Compiler:

  1. Source code
  2. Scanner(Lexical Analysis, generate tokens), use regular expression
  3. Parser(Syntax Analysis, generate AST), use Context-Free Grammar
  4. Type Checker(Semantic Analysis, generate decorated AST), use Attribute Grammar
  5. Translator(generate IR, 3AC), Static Analysis at this step (code optimization).
  6. Code Generator(generate machine code)

AST v.s. IR

ASTvsIR

IR: 3-Address Code (3AC)

3AC: at most one operator on the right of an instruction.

t2 = a + b + 3 t1 = a + b; t2 = t1 + 3

3AC contains at most 3 addresses, one address can be: 1.name of a variable; 2.Constant; 3.Temporary generated by compiler.

3AC in Real: Soot

JVM中四种方法调用:

  1. invokespecial: call constructor, call superclass method, call private method
  2. invokevirtual: instance methods call (virtual dispatch)
  3. invokeinterface: cannot optimization, checking interface implementation
  4. invokestatic: call static methods

Java7: invokedynamic Java static typing, dynamic language run on JVM

method signature: class name + return type + method name + parameter types

clinit method called when class loaded

Static Singel Assignment (SSA)

SSA中所有的assignment会赋给一个新的变量名,保证每个变量在SSA中只被赋值一次。

SSA

遇到分支时,会生成一个新的变量名,并利用phi函数来表示控制流。

SSA-controlFlow

Why SSA?

  1. flow-insensitive analysis can get some flow-sensitive information by SSA.
  2. define-use pairs can be easily found in SSA.

Why not SSA?

  1. too many variables
  2. inefficiency, too many cp instructions

Basic Blocks (BB)

BB: maximal sequence of instructions that can be executed continuously, can be entered only at the bening and exited only at the end.

算法思路: goto指令所在语句与goto指令的目标语句是BB划分的边界。

leader: first instruction in P; goto target of P; next instruction after a goto instruction.

Control Flow Graphs (CFG)

How to build CFG on top of BBs?

  1. There is an edge from BB A to BB B if and only if: a. A中存在goto指令从A跳转到B b. B is a direct succeesor of A and A最后一条语句不是无条件goto

  2. replace jump to instruction lables by jump to BBs

  3. Add Entry and Exit BBs.