Rizin
unix-like reverse engineering framework and cli tools
|
ESIL stands for 'Evaluable Strings Intermediate Language'. It aims to describe a Forth-like representation for every opcode. Those representations can be evaluated in order to emulate code. Each element of an esil expression is separated by a comma. The VM can be described as this:
while ((word=haveCommand())) { if (word.isKeyword()) { esilCommands[word](esil); } else { esil.push (evaluateToNumber(word)); } nextCommand(); }
The esil commands are operations that pop values from the stack, performs some calculations and pushes the result in the stack (if any). They aim to cover all common operations done by CPUs, permitting to do binary operations, memory peeks and pokes, spawning a syscall, etc.
[0x00000000]> e asm.esil = true
An opcode is translated into a comma separated list of ESIL expressions.
xor eax, eax -> 0,eax,=,1,zf,=
Memory access is defined by brackets.
mov eax, [0x80480] -> 0x80480,[],eax,=
Default size is the destination of the operation. In this case 8bits, aka 1 byte.
movb $0, 0x80480 -> 0,0x80480,=[1]
Conditionals are expressed with the '?' char at the beginning of the expression. This checks if the rest of the expression is 0 or not and skips the next expression if doesn't matches. $
is the prefix for internal vars.
cmp eax, 123 -> 123,eax,==,$z,zf,= jz eax -> zf,?{,eax,eip,=,}
So.. if you want to run more than one expression under a conditional, you'll have to write it
zf,?{,eip,esp,=[],eax,eip,=,$r,esp,-=,}
The whitespace, newlines and other chars are ignored in esil, so the first thing to do is:
esil = r_str_replace (esil, " ", "", true);
Syscalls are specially handled by '$' at the beginning of the expression. After that char you have an optional numeric value that specifies the number of syscall. The emulator must handle those expressions and 'simulate' the syscalls. (r_esil_syscall
)
As discussed on irc, current implementation works like this:
a,b,- b - a a,b,/= b /= a
This approach is more readable, but it's less stack-friendly
NOPs are represented as empty strings. Unknown or invalid instructions
Syscalls are implemented with the '0x80,$' command. It delegates the execution of the esil vm into a callback that implements the syscall for a specific kernel.
Traps are implemented with the <trap>,<code>,$$
command. They are used to throw exceptions like invalid instructions, division by zero, memory read error, etc.
Here's a list of some quick checks to retrieve information from an esil string. Relevant information will be probably found in the first expression of the list.
indexOf('[') -> have memory references indexOf("=[") -> write in memory indexOf("pc,=") -> modifies program counter (branch, jump, call) indexOf("sp,=") -> modifies the stack (what if we found sp+= or sp-=?) indexOf("=") -> retrieve src and dst indexOf(":") -> unknown esil, raw opcode ahead indexOf("$") -> accesses internal esil vm flags indexOf("$") -> syscall indexOf("$$") -> can trap indexOf('++') -> has iterator indexOf('--') -> count to zero indexOf("?{") -> conditional indexOf("LOOP") -> is a loop (rep?) equalsTo("") -> empty string, means: nop (wrong, if we append pc+=x)
Common operations:
CPU flags are usually defined as 1 bit registers in the RReg profile. and sometimes under the 'flg' register type.
ESIL VM have an internal state flags that can are read only and can be used to export those values to the underlying CPU flags. This is because the ESIL vm defines all the flag changes, while the CPUs only update the flags under certain conditions or specific instructions.
Those internal flags are prefixed by the '$' character.
What to do with them? What about bit arithmetic if use variables instead of registers?
TODO
ESIL specifies that the parsing control-flow commands are in uppercase. Bear in mind that some archs have uppercase register names. The register profile should take care to not reuse any of the following:
3,SKIP - skip N instructions. used to make relative forward GOTOs 3,GOTO - goto instruction 3 LOOP - alias for 0,GOTO BREAK - stop evaluating the expression STACK - dump stack contents to screen CLEAR - clear stack
Usage example:
cx,!,?{,BREAK,},esi,[1],edi,[1],^,!,?{,BREAK,},esi,++,edi,++,cx,--,LOOP
Those are expressed with the 'TODO' command. which acts as a 'BREAK', but displaying a warning message describing which instruction is not implemented and will not be emulated.
For example:
fmulp ST(1), ST(0) => TODO,fmulp ST(1),ST(0)
As an example implementation of ESIL analysis for the AVR family of microcontrollers there is a avr_op
function in /librz/analysis/p/analysis_avr.c
which contains information on how the instructions are expressed in ESIL and other opcode information such as cycle counts per instruction:
Variables d, r and k refer to "destination", "register" and "(k)onstant", respectively. They are used later on by ESIL string formatting function like for instance:
r_strbuf_setf (&op->esil, "0x%x,r%d,=", k, d);
Which in this case corresponds to the LDI (LoaD with immediate) instruction in AVR. As an example, the above ESIL string template will translate into the following when reversing in Rizin:
0x00000080 30e0 0x0,r19,= ; LDI Rd,K. load immediate
Or in non-ESIL format:
0x00000080 30e0 ldi r19, 0x00 ; LDI Rd,K. load immediate
Looking at other architectures which already have mature ESIL support such as x86 can help in understanding the syntax and conventions of Rizin's ESIL.
To ease esil parsing we should have a way to express introspection expressions to extract the data we want. For example. We want to get the target address of a jmp.
The parser for the esil expressions should be implemented in an API to make it possible to extract information by analyzing the expressions easily.
ao~esil,opcode
opcode: jmp 0x10000465a esil: 0x10000465a,rip,=
We need a way to retrieve the numeric value of 'rip'. This is a very simple example, but there will be more complex, like conditional ones and we need expressions to get:
It is important for emulation to be able to setup hooks in the parser, so we can extend the parser to implement the analysis without having to write the parser again and again. This is, every time an operation is going to be executed we call a user hook which can be used to determine if rip is changing or if the instruction updates the stack.
Later, at this level we can split that callback into several ones to have an event based analysis api that may be extended in js like this:
esil.on('regset', function(){.. esil.on('syscall', function(){esil.regset('rip'
we have already them. see hook_flag_read()
hook_execute()
hook_mem_read()
...
Other operations that require bindings to external functionalities to work. In this case r_ref
and r_io
. This must be defined when initializing the esil vm.
Out ax, 44 44,ax,:ou
Mov eax, ds:[ebp+8] Ebp,8,+,:ds,eax,=