8/12/2019 11.1.Deobfuscation Presentation
1/45
DeobfuscationReverse Engineering Obfuscated Code
by Sharath K. Udupa, Saumya K. Debray and Matias Madou
Presented by Fabrizio Steiner
1
8/12/2019 11.1.Deobfuscation Presentation
2/45
Overview
Introduction Obfuscation Transformations
Basic Control Flow Flattening Enhancements
Interprocedural Data Flow Artificial Blocks and Pointers
Deobfuscation Cloning Static Path Feasibility Analysis Combining Static and Dynamic Analysis
Experiments
Conclusion2
8/12/2019 11.1.Deobfuscation Presentation
3/45
Introduction:What is obfuscated Code?
Obfuscating is used to prevent reverse engineering.
Code is very hard do read and to understand.
Important for newer environments such as .NET or JAVA
Obfuscator takes a code block creates unreadable code.
Deobfuscator tries to reverse obfuscation. (statistically, dynamicanalysis)
3
8/12/2019 11.1.Deobfuscation Presentation
4/45
Why obfuscate .NET code?
4
8/12/2019 11.1.Deobfuscation Presentation
5/45
Introduction
Goals of obfuscation
improve software security
hard to reverse engineer code
protects the owners intellectual property
Could also be used to hide malicious software (prevent thedetection).
5
8/12/2019 11.1.Deobfuscation Presentation
6/45
Introduction
Raises 2 Questions
What sort of techniques are useful to understand obfuscatedCode?
What are the weaknesses of current code obfuscation techniquesand how can we address them?
Please keep in mind these two questions during the presentation!
We will see for example the weakness of control flow flattening
6
8/12/2019 11.1.Deobfuscation Presentation
7/45
Obfuscating Transformations
2 classes of transformations
1. surface obfuscation (focuses on syntax)
2. deep obfuscation (focuses on program structure)
7
8/12/2019 11.1.Deobfuscation Presentation
8/45
Surface obfuscation
Harder for human to understand
No effect to determine algorithms
Variable renaming: easy undo by applying parser that resolves thevariable references.
8
8/12/2019 11.1.Deobfuscation Presentation
9/45
Deep obfuscation
Changes the programs control flow and data references
Affects efficacy of semantic tools for reverse engineering
Working around deep obfuscation much harder than surfaceobfuscation.
We only take a look on the harder one! Deep obfuscation taken fromChenxi Wangs dissertation.
9
8/12/2019 11.1.Deobfuscation Presentation
10/45
Basic Control Flow Flattening
Aims to obscure the control flow logic
By flattening the control flow graph, all basic blocks will have thesame set of predecessors and successors.
Control flow during execution is guided by a dispatcher variable .Basic Block assigns the correct value for the next block to take.
Switch block uses the dispatcher, to jump to the block
10
8/12/2019 11.1.Deobfuscation Presentation
11/45
int f(int i, int j)
{ int res = -1; if (i < j) { res = i;
} else { if (i = j) { res = 0;
} else { res = j; } } return res;}
res = -1i < j ?
res = i i = j ?
res = 0 res = j
return res
Y
Y
N
N
A
B C
ED
F
Example: Basic Control Flow Flattening
11
8/12/2019 11.1.Deobfuscation Presentation
12/45
int f(int i, int j)
{ int res = -1; if (i < j) { res = i;
} else { if (i = j) { res = 0;
} else { res = j; } } return res;}
res = -1x = i < j?1 : 2
Ares = ix = 5
Bx = i = j ? 3 : 4
Cres = jx=5
Eres = 0x= 5
Dreturn res
F
switch (x)
x = 0
0 1 2 3 4 5
Example: Basic Control Flow Flattening
11
8/12/2019 11.1.Deobfuscation Presentation
13/45
Enhancements
We will discuss 2 enhancements of basic block flattening now.
Those make it more difficult to deobfuscate the code.
12
8/12/2019 11.1.Deobfuscation Presentation
14/45
Enhancement 1: Interprocedural Data Flow
In Basic CtrFl Flattening the values assigned to the dispatchervariable are within the function.
The control flow is not obvious.
Reconstructing by examining the values assigned to dispatchervariable.
Requires only intraprocedural analysis.
13
8/12/2019 11.1.Deobfuscation Presentation
15/45
Enhancement 1: Interprocedural Data Flow
Improving by using interprocedural informations.
Idea: Use global array to pass dispatch values.
Every call-site writes the values to a random field of the array. Every
call-site has a different random to this field.
Obfuscated code assigns values from array to the dispatch variable.
Locations accessed and the contents of these locations arentevident by examining the callee code.
14
8/12/2019 11.1.Deobfuscation Presentation
16/45
res = -1x = i < j?1 : 2
Ares = ix = 5
Bx = i = j ? 3 : 4
Cres = jx=5
Eres = 0x= 5
Dreturn res
F
switch (x)
x = 0
0 1 2 3 4 5
Example: Interprocedural Data Flow
15
8/12/2019 11.1.Deobfuscation Presentation
17/45
res = -1x = i < j?1 : 2
Ares = ix = 5
Bx = i = j ? 3 : 4
Cres = jx=5
Eres = 0x= 5
Dreturn res
F
switch (x)
x = 0
0 1 2 3 4 5
int A[...]; // global array of indicesint w; // offset into array A
w = random1A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)
caller 1:w = random2A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)
caller 2:
Example: Interprocedural Data Flow
15
8/12/2019 11.1.Deobfuscation Presentation
18/45
int A[...]; // global array of indicesint w; // offset into array A
w = random1A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)
caller 1:w = random2A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)
caller 2:
res = -1x = i < j?
A[w+1] :A[w+2]
res = ix = A[w+4]
Bx = i = j ?
A[w] :A[w+3]
Cres = jx = A[w+4]
Eres = 0x = A[w+4]
Dreturn res
F
switch (x)
x = 0
0 1 2 3 4 5
Example: Interprocedural Data Flow
15
8/12/2019 11.1.Deobfuscation Presentation
19/45
Enhancement 2: Artificial Blocks and Pointers
Enhancement 1: Extended by adding artificial Blocks
Artificial Blocks
Some never executed
Difficult to determine with static analysis (caused by dynamicallyindirect branch targets)
Adding indirect loads and stores (pointers) to these unreachable
blocks. Confusing static analysis about taken dispatch values.
16
8/12/2019 11.1.Deobfuscation Presentation
20/45
Enhancement 2: Artificial Blocks and Pointers
How it works!
Add 2 artificial blocks (B, B) for every basic block.
B will be executed, so indirect assignments through pointers set the
dispatch variable.
B also contains indirect assignments, never executed. Only forconfusing static analyzer.
17
8/12/2019 11.1.Deobfuscation Presentation
21/45
Example: Artificial Blocks and Pointers
B
01 2
3
a = j
switch (x)
A
a = 1 x = 3
C
i = i 1a = a*iD
return ax = i < j ? 1 : 2x = i > 0 ? 2 : 3
f: x = 0
S
Init
18
8/12/2019 11.1.Deobfuscation Presentation
22/45
switch (x)
A
p = &b*p = 4q = &c*q = 6x = 1
p = &b*p = 3
a = jx = b
p = &b*p = 9
p = &b
q = &c*p = 8
*q = 9x = 8
*p = 3q = &c
p = &a
C
i = i 1a = a*i
return a
D
01
2 3 4 5 6 7 8 9
x = 0
S
Init
f:
int a, b, c, *p, *q
p = &b*p = 3q = &c
x = 1*q = 4
a = 1x = i < j ? b : c
*q = 9x = b
x = i > 0 ? b : cx = 6x = 5
B
Example: Artificial Blocks and Pointers
B
01 2
3
a = j
switch (x)
A
a = 1 x = 3
C
i = i 1a = a*iD
return ax = i < j ? 1 : 2x = i > 0 ? 2 : 3
f: x = 0
S
Init
18
8/12/2019 11.1.Deobfuscation Presentation
23/45
Deobfuscation
We now consider some methods for reverse engineering obfuscatedcode.
Obfuscation inserts spurious execution paths into programs.
To cause bogus information during program analysis.
1 is the original control flow path and 2 is the spurious control flowpath.
2 introduces imprecision to program analysis,where execution paths join.
21
A
B
19
8/12/2019 11.1.Deobfuscation Presentation
24/45
Deobfuscation
Forward analysis (reaching definitions) results are tainted at theentry of B.
Results of backward analysis (liveness analysis) are affected at theexit of A.
To address this problem one could clone portion of the program
20
8/12/2019 11.1.Deobfuscation Presentation
25/45
Cloning
Clone some parts of program, in such a way that spurious paths nolonger join original paths.
Applying cloning creates a new blockB
Improves forward dataflow information.
Backward dataflow arent improved. At exit of A we still have thepossibility to take the spurious path.
B
1
A
2
B
21
8/12/2019 11.1.Deobfuscation Presentation
26/45
Cloning (2)
Goal of deobfuscation to identify spurious paths.
But where should we apply cloning? We dont know which paths arespurious.
=> Simple approach clone every block where multiple paths join.
If obfuscater is known => improve the cloning.
22
8/12/2019 11.1.Deobfuscation Presentation
27/45
S
A B C
Example Cloning
23
8/12/2019 11.1.Deobfuscation Presentation
28/45
S
A B C
S
A B C
S1 S2 S3
A B C A B C A B C
Example Cloning
23
8/12/2019 11.1.Deobfuscation Presentation
29/45
Static Path Feasibility Analysis
Constraint-based static analysis to determine if a execution path(acyclic) is feasible or not.
Given execution path with a set of live variables x at entry.
Construction of C such that ( x)C is unsatisfiable if for allexecutions of the program the is never executed.
So is unfeasible.
24
8/12/2019 11.1.Deobfuscation Presentation
30/45
Static Path Feasibility Analysis (2)
Many ways to construct the constrain.
We take into account arithmetic operations.
Propagation of information among a single path, not along all
execution paths.
Each instruction named , value after Instruction
Unknown value
k k
25
8/12/2019 11.1.Deobfuscation Presentation
31/45
Static Path Feasibility Analysis Rules
Assignment: => j most recent instruction defined y
Arithmetic: =>expresses the semantic of the operation.
If semantic is not known, or either or is unknownthen
Indirection: Pointers can be modeled at different levels
I k x = y C k xk = y j ,
I k x = y z C k xk = f yi , z j f
C k x = yi z j
26
8/12/2019 11.1.Deobfuscation Presentation
32/45
Static Path Feasibility Analysis Rules (2)
Branches: for some boolean expression e
Unconditional branches treated as
Other: Analysis aborted and branch assumed to be feasible.
Constraint constructed as a conjunction of every instructions
A constraint solver could determine if the path is feasible or not.
I k if e goto L
C k e if I k is a taken branch in ;
e if I k is not taken in ;
e true
C k
27
8/12/2019 11.1.Deobfuscation Presentation
33/45
B0(1)(2)(3)
B1 B2(4) (5)
B3(6)
B4 B5
x = 1
if (u > 0) goto B1y = 2
z = x + y z = x y
if (z > 0) goto B5
Example: Static Path Feasibility Analysis Rules
28
8/12/2019 11.1.Deobfuscation Presentation
34/45
B0(1)(2)(3)
B1 B2(4) (5)
B3(6)
B4 B5
x = 1
if (u > 0) goto B1y = 2
z = x + y z = x y
if (z > 0) goto B5
= B 0 B 2 B 3 B 5
Example: Static Path Feasibility Analysis Rules
28
8/12/2019 11.1.Deobfuscation Presentation
35/45
B0(1)(2)(3)
B1 B2(4) (5)
B3(6)
B4 B5
x = 1
if (u > 0) goto B1y = 2
z = x + y z = x y
if (z > 0) goto B5
u o( )[x1 = 1 y 2 = 2 u 0
0 z 5 = x 1 - y 2 z 5 > 0]
= B 0 B 2 B 3 B 5
Example: Static Path Feasibility Analysis Rules
28
8/12/2019 11.1.Deobfuscation Presentation
36/45
B0(1)(2)(3)
B1 B2(4) (5)
B3(6)
B4 B5
x = 1
if (u > 0) goto B1y = 2
z = x + y z = x y
if (z > 0) goto B5
u o( )[x1 = 1 y 2 = 2 u 0
0 z 5 = x 1 - y 2 z 5 > 0]
= B 0 B 2 B 3 B 5
Example: Static Path Feasibility Analysis Rules
unfeasible
28
8/12/2019 11.1.Deobfuscation Presentation
37/45
Combining Static and Dynamic Analysis
Static analysis is inherently conservative.
Set of paths from static deobfuscation is a superset of actual paths.
On the other way dynamic analysis cant consider all possible input
values.
What about combining them?
Start with underapproximated set of control flow paths fromdynamic analysis.
29
8/12/2019 11.1.Deobfuscation Presentation
38/45
Combining Static and Dynamic Analysis (2)
Use static analysis to add paths, that could be taken.
Also possible first use static then dynamic analysis.
But the result set still could contain more or less paths.
Authors took first dynamic then static.- Suppose we know a way to determine all taken paths during
execution.- Mark these paths and propagate information only across these paths.
30
8/12/2019 11.1.Deobfuscation Presentation
39/45
Combining Static and Dynamic Analysis (3)
Conventional static analysis is degeneration where all paths aremarked.
Improving for analyzer: Mark paths that are taken during dynamicanalysis.
Propagate dataflow information along these paths.
If a branch is reached where only one outgoing path is marked, andthe outcome cant be uniquely determined. Add the untaken branchto the marked (taken) set.
31
8/12/2019 11.1.Deobfuscation Presentation
40/45
8/12/2019 11.1.Deobfuscation Presentation
41/45
8/12/2019 11.1.Deobfuscation Presentation
42/45
8/12/2019 11.1.Deobfuscation Presentation
43/45
8/12/2019 11.1.Deobfuscation Presentation
44/45
Conclusions
Code obfuscation proposed by a number of researchers.
Rely on theoretical difficulty of reasoning statically kinds of programproperties.
Shown that combination of static and dynamic analysis bypassesmuch of the effects of obfuscators.
Control Flow Flattening used in commercial obfuscators, can beremoved in a relatively straightforward way.
36
8/12/2019 11.1.Deobfuscation Presentation
45/45
Questions?
Thank you for your attention
37
Top Related