Working on an efficient generic shellcode detection engine and verifying results with randomly generated input, I've effectively ended up fuzzing different open source disassembler libraries. The disassembler library of choice for my current project is libdasm because of its comparatively long history and public domain license. But writing a sound and complete x86 disassembler is obviously not a trivial task due to the complex nature of the x86 instruction set.
libdasm used to have issues correctly disassembling certain floating point instructions in the past, but this was simply caused by an off-by-three error in the opcode lookup tables (three NULL rows missing) and thus the fix was comparatively easy.
What I stumbled across today seems not to be a opcode specific issue but instead a bug in decoding instructions correctly. When libdasm disassembles instructions with a 16-bit address prefix, it decodes the address immediate wrong:
[~] Verifying shellcode candidate offset 8eb0f0 008fe0f0[ 67a02232e830] > mov al,[0x30e83222] 008fe0f6[ 61] > popa 008fe0f7[ f9] > stc 008fe0f8[ ff4038] > inc [eax+0x38] 008fe0fb[ b269] > mov dl,0x69 008fe0fd[ 52] > push edx 008fe0fe[ 3f] > aas 008fe0ff[ 5e] > pop esi 008fe100[ 1a3dc31168aa] > sbb bh,[0xaa6811c3] 008fe106[ 59] > pop ecx 008fe107[ 9c] > pushf 008fe108[................] <
The instruction at the virtualized guest's memory address
008fe0f0 is not decoded correctly:
67is the previously mentioned 16-bit address size prefix
a0is the opcode for
mov al, moffs8
2232is the 16-bit address that should be interpreted as the operand
e830does not belong to this instruction
Just like you should always consult a second doctor about exotic diseases, I gave udis86, a different disassembler library, a shot:
$ udcli -noff -32 -s `python -c 'print 0x8eb0f0'` -c 10 shellcode/urandom.bin 67a02232 a16 mov al, [0x3222] e83061f9ff call 0xfffffffffff96139 40 inc eax
mov instruction got disassembled correctly this time. And since
e830 is not interpreted as part of
mov's immediate anymore, it now correctly disassembles as a
call rel32 instruction. Unfortunately, udis86 is a x86-64 aware disassembler and internally sign-extends the operand to
call, yet again giving incorrect disassembly.
So what does my CPU actually execute and see? Since this is part of a virtualization / emulation code anyway, we can simply add a
cc breakpoint to the block's prologue and step through it with gdb (omitting some junk):
Program received signal SIGTRAP, Trace/breakpoint trap. (gdb) disas $eip, $eip+5 => 0x0804b0c1: jmp 0x804b134 (gdb) si (gdb) disas $eip, $eip+10 Dump of assembler code from 0x804b134 to 0x804b13e: => 0x0804b134: addr16 mov 0x3222,%al 0x0804b138: call 0x7fe126d 0x0804b13d: inc %eax End of assembler dump. (gdb) si (gdb) si (gdb) disas $eip, $eip+10 Dump of assembler code from 0x7fe126d to 0x7fe1277: => 0x07fe126d: Cannot access memory at address 0x7fe126d
So the CPU really sees a call instruction and tries to execute it. In this particular case, this would have been a devestating scenario as it would allow a privilegue escalation vulnerability for arbitrary user input, likely shellcode, to break out of the virtualization isolation. For this specific approach to work correctly, all control flow modifying instructions like
call must be emulated in software. If we however do not see such an instruction in the disassembly, we cannot handle it correctly.
After patching libdasm (which turned out to ignore address size prefixes for operand parsing entirely), the disassembly is correct:
[*] 543 shellcode candidate offsets [~] Verifying shellcode candidate offset 8eb0f0 008fe0f0[ 67a02232] > mov al,[0x3222] 008fe0f4[................] < Emulating 008fe0f4: call 0x894229 Emulating CALL instruction from 8fe0f9.
Lessons learned today: