Compiler and Static Analyzer
Compiler:
- Source code
- Scanner(Lexical Analysis, generate tokens), use regular expression
- Parser(Syntax Analysis, generate AST), use Context-Free Grammar
- Type Checker(Semantic Analysis, generate decorated AST), use Attribute Grammar
- Translator(generate IR, 3AC), Static Analysis at this step (code optimization).
- Code Generator(generate machine code)
AST v.s. IR
IR: 3-Address Code (3AC)
3AC: at most one operator on the right of an instruction.
t2 = a + b + 3 → t1 = a + b; t2 = t1 + 3
3AC contains at most 3 addresses, one address can be: 1.name of a variable; 2.Constant; 3.Temporary generated by compiler.
3AC in Real: Soot
JVM中四种方法调用:
- invokespecial: call constructor, call superclass method, call private method
- invokevirtual: instance methods call (virtual dispatch)
- invokeinterface: cannot optimization, checking interface implementation
- invokestatic: call static methods
Java7: invokedynamic → Java static typing, dynamic language run on JVM
method signature: class name + return type + method name + parameter types
clinit method called when class loaded
Static Singel Assignment (SSA)
SSA中所有的assignment会赋给一个新的变量名,保证每个变量在SSA中只被赋值一次。
遇到分支时,会生成一个新的变量名,并利用phi函数来表示控制流。
Why SSA?
- flow-insensitive analysis can get some flow-sensitive information by SSA.
- define-use pairs can be easily found in SSA.
Why not SSA?
- too many variables
- inefficiency, too many cp instructions
Basic Blocks (BB)
BB: maximal sequence of instructions that can be executed continuously, can be entered only at the bening and exited only at the end.
算法思路: goto指令所在语句与goto指令的目标语句是BB划分的边界。
leader: first instruction in P; goto target of P; next instruction after a goto instruction.
Control Flow Graphs (CFG)
How to build CFG on top of BBs?
-
There is an edge from BB A to BB B if and only if: a. A中存在goto指令从A跳转到B b. B is a direct succeesor of A and A最后一条语句不是无条件goto
-
replace jump to instruction lables by jump to BBs
-
Add
Entry
andExit
BBs.