English
The Internet threat alert status is currently normal. At present, no major epidemics or other serious incidents have been recorded by Kaspersky Lab’s monitoring service. Internet threat level: 1

The Mystery of the Duqu Framework

Igor Soumenkov
Kaspersky Lab Expert
Posted March 07, 15:58  GMT
Tags: Duqu
1.2
 

While analyzing the components of Duqu, we discovered an interesting anomaly in the main component that is responsible for its business logics, the Payload DLL. We would like to share our findings and ask for help identifying the code.

Code layout

At first glance, the Payload DLL looks like a regular Windows PE DLL file compiled with Microsoft Visual Studio 2008 (linker version 9.0). The entry point code is absolutely standard, and there is one function exported by ordinal number 1 that also looks like MSVC++. This function is called from the PNF DLL and it is actually the “main” function that implements all the logics of contacting C&C servers, receiving additional payload modules and executing them. The most interesting is how this logic was programmed and what tools were used.

The code section of the Payload DLL is common for a binary that was made from several pieces of code. It consists of “slices” of code that may have been initially compiled in separate object files before they were linked in a single DLL. Most of them can be found in any C++ program, like the Standard Template Library (STL) functions, run-time library functions and user-written code, except the biggest slice that contains most of C&C interaction code.


Layout of the code section of the Payload DLL file

This slice is different from others, because it was not compiled from C++ sources. It contains no references to any standard or user-written C++ functions, but is definitely object-oriented. We call it the Duqu Framework.

The Framework

Features

The code that implements the Duqu Framework has several distinctive properties:

  • Everything is wrapped into objects
  • Function table is placed directly into the class instance and can be modified after construction
  • There is no distinction between utility classes (linked lists, hashes) and user-written code
  • Objects communicate using method calls, deferred execution queues and event-driven callbacks
  • There are no references to run-time library functions, native Windows API is used instead

Objects

All objects are instances of some class, we identified 60 classes. Each object is constructed with a “constructor” function that allocates memory, fills in the function table and initializes members.


Constructor function for the linked list class.

The layout of each object depends on its class. Some classes appear to have binary compatible function tables but there is no indication that they have any common parent classes (like in other OO languages). Furthermore, the location of the function table is not fixed: some classes have it at offset 0 of the instance, but some does not.


Layout of the linked list object. First 10 fields are pointers to member functions.

Objects are destroyed by corresponding “destructor” functions. These functions usually destroy all objects referenced by member fields and free any memory used.

Member functions can be referenced by the object’s function table (like “virtual” functions in C++) or they can be called directly. In most object-oriented languages, member functions receive the “this” parameter that references the instance of the object, and there is a calling convention that defines the location of the parameter – either in a register, or in stack. This is not the case for the Duqu Framework classes – they can receive “this” parameter in any register or in stack.


Member function of the linked list, receives “this” parameter on stack

Event driven framework

The layout and implementation of objects in the Duqu Framework is definitely not native to C++ that was used to program the rest of the Trojan. There is an even more interesting feature of the framework that is used extensively throughout the whole code: it is event driven.

There are special objects that implement the event-driven model:

  • Event objects, based on native Windows API handles
  • Thread context objects that hold lists of events and deferred execution queues
  • Callback objects that are linked to events
  • Event monitors, created by each thread context for monitoring events and executing callback objects
  • Thread context storage manages the list of active threads and provides access to per-thread context objects

This event-driven model resembles Objective C and its message passing features, but the code does not have any direct references to the language, neither does it look like compiled with known Objective C compilers.


Event-driven model of the Duqu Framework

Every thread context object can start a “main loop” that looks for and processes new items in the lists. Most of the Duqu code follow the same principle: create an object, bind several callbacks to internal or external events and return. Callback handlers are then executed by the event monitor object that is created within each thread context.

Here is an example pseudocode for a socket object:

SocketObjectConstructor {
    NativeSocket = socket();
    SocketEvent = new MonitoredEvent(NativeSocket);
    SocketObjectCallback = new ObjectCallback(this, SocketEvent, OnCallbackFunc);
    connect(NativeSocket, ...);
}
OnCallbackFunc {
    switch(GetType(Event)) {
    case Connected: ...
    case ReadData: ...
...}
}

Conclusions

  • The Duqu Framework appears to have been written in an unknown programming language.
  • Unlike the rest of the Duqu body, it's not C++ and it's not compiled with Microsoft's Visual C++ 2008.
  • The highly event driven architecture points to code which was designed to be used in pretty much any kind of conditions, including asynchronous commutations.
  • Given the size of the Duqu project, it is possible that another team was responsible for the framework than the team which created the drivers and wrote the system infection and exploits.
  • The mysterious programming language is definitively NOT C++, Objective C, Java, Python, Ada, Lua and many other languages we have checked.
  • Compared to Stuxnet (entirely written in MSVC++), this is one of the defining particularities of the Duqu framework.

The Duqu Framework: What was that?

After having performed countless hours of analysis, we are 100% confident that the Duqu Framework was not programmed with Visual C++. It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language.

We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost. We are confident that with your help we can solve this deep mystery in the Duqu story.


161 comments

Oldest first
Table view
 

Wladimir

2012 Mar 09, 15:39
0
 

Re: object oriented C

I agree. As I see it, the lack of rigid consistency suggests this is an object-oriented C framework with function pointers:

- Method pointers are in the instance, and are sometimes changed after construction, and can be at different offsets in the structure

- No inheritance, but structures might be binary compatible

- "this" pointer can be in different argument positions

- Explicit construction/destruction of objects, and allocation/deallocation of memory, no high level "magic" like GCs, VMs and JITs

Alas, it would have been fun to find a mystery programming language. Does the generated code match any of the known C compilers? And it's interesting that no use is made of libc, all are direct Win32 calls, so this may be some kind of framework for embedded systems.

Edited by Wladimir, 2012 Mar 09, 16:23

Reply    

intointo

2012 Mar 09, 15:41
0
 

Magic software

The code looks similar to the one used in the Magic Software eDeveloper solution (now known as UniPaas).

Reply    

SCooke

2012 Mar 09, 15:54
0
 

Re: That code looks familiar

It's easier to figure this out if you consider vendor sourcing. The work was probably done by a government. And, whether the software was sourced through a US agency or whether a US agency itself was the creator, the net result is the same: you're looking for a major GSA-contracted firm who A) has clearance, B) has a compiler team, C) has a track record of providing similar product to the US government, and D) has a compiler codebase that looks kind of unfamiliar and not mainstream.

The likely suspects fitting that set of criteria are IBM, Microsoft, SAS and SAIC. All the others (remnant AT T, HP, remnant SGI... who am I forgetting?) incorporate a considerable amount of fairly recognizable shared compiler code in their offerings. Since you've disqualified Microsoft, my bet is on IBM.

I don't think it's SAS, because their compiler codebase is ancient. I don't think it's SAIC, because for them this would be a fairly difficult project. Three reasons why I think IBM.

First is that IBM has a library of bizarro options to select from. There's an internal HLASM-to-C frontend. There's all the CSet descendants. They've got research versions of damn near everything. (I'd try getting ahold of the ia32 version of CSet - probably hard to come by, but out there). They've also got a Windows source license, and if you were going to write a virus, that's always handy.

Second is that IBM has a history of doing projects like this. If there was a federal bid, they almost certainly would have been a bidder.

Third is that the project could have been run out of IBM Haifa. A number of the old IBM AV team probably either were there or ended up there, so it wouldn't be too far out of their wheelhouse. And if you wanted to build a state-sponsored virus, you'd almost certainly want to build it in a country who already has near-active hostilities with the intended target for the virus such that those acts of aggression don't become de facto acts of war for you.

If you want to dig into that, have someone from IBM wander through the employee-written and internal software libraries for all the preprocessor frontends for various languages and compiler backends that output to ia32. Probably none of that is inherently secret. I bet you'll find something that produces similar output.

Reply    

Andreas Bogk

2012 Mar 09, 15:59
0
 

Definitely not a Lisp

I haven't seen much of the Duqu code except for what is posted here, and I would appreciate more to have a better opinion. But from what I see, I can say this definitely is not a Lisp or any related language (Python, JS, Ruby, younameit).

Clues: the presence of destructors (only makes sense in a language with manual memory management), lack of type bits in pointers (means probably dealing with static types instead of dynamic types).

Another clue to me is the lack of any calling convention different from the C calling convention. My bet is on a thin layer of preprocessing as a C frontend, giving some form of rudimentary OO functionality. Sort of like glibc, but different. Maybe even implemented using C macros.

Reply    

Max

2012 Mar 09, 15:59
0
 

Have a try

Maybe javascript, or E language(http://www.dywt.com.cn)? just kidding :-)

Reply    

eternity

2012 Mar 09, 16:09
0
 

Rational Rose compiler

Using oriented programming...
Old school.

Reply    

igorsk

2012 Mar 09, 16:20
0
 

Simple Object Orientation (for C)

It seems someone over at reddit (http://www.reddit.com/r/ReverseEngineering/) hit the jackpot: the code snippets look _very_ similar to what this would produce:

http://daifukkat.su/wiki/index.php/SOO

There are a few other OO frameworks for C, but they don't match as well:
http://ooc-coding.sourceforge.net/
http://sooc.sourceforge.net/

Reply    

acsMike

2012 Mar 09, 17:10
0
 

Re: Simple Object Orientation (for C)

If this is so, what benefits do you think the author was after?

Reply    

ZuZ

2012 Mar 09, 17:55
0
 

Synon

Looks very much like Synon Code to me.

Reply    

mrlozer

2012 Mar 09, 18:30
0
 

Another guess

It is ActionScript.

Reply    

Robert M

2012 Mar 09, 18:50
0
 

Other C/C++ compiler?

Isnt it possible that the original language is still C/C++, but the code generation is done by something else than MSC++/Visual Studio?

Could it be the Intel C/C++ compiler (avail for eval from intel.com), Clang or some older version of a compiler e.g. gcc?

Reply    

MMandrake

2012 Mar 09, 18:50
0
 

New Compiler

Probably it's LISP with a re-edited compiler that changed the syntaxis of the commands for new ones

Reply    

jonwil

2012 Mar 09, 18:57
0
 

Re: Other C/C++ compiler?

I have seen how GCC works internally and its ABI (for a number of different versions) and I can confirm that the Duqu code is definatly not generated by GCC. I dont know how other C++ compilers work but the things I see in the ASM (like where the pointers to the functions go, the way the "this" pointer is passed etc) do not suggest C++ to me but something else entirely. (such as the aforementioned "object-oriented" frameworks for C that exist)

We know that it has to be 32-bit Windows (and probably modern) and that its not a payload for some embedded system because its calling Windows APIs. We know that whatever it is is spitting out .obj files compatible with the Microsoft compiler.

More information is needed (such as any strings in the file or the ASM for the memory allocate/free functions or more about exactly which dlls this imports from) to truly figure this out IMO.

Reply    

igorsk

2012 Mar 09, 19:07
1
 

Re: Other C/C++ compiler?

I'm 99% sure the machine code was generated by MSVC. It's something you get a feel with experience, but I can point out two things that are quite characteristic of MSVC: 1) it uses esi as the first candidate for temporary storage; 2) "pop ecx" instead of "add esp, 4".

Reply    

MJB

2012 Mar 09, 19:16
0
 

Compiler list?

Is there a public list you are keeping of languages/compilers that you have been able to check against?

I'd also not read too much into the lack of compiler identification in the binaries. There are many ways to obscure the binary after compilation.

Many years ago when I was writing laptop tracking software, which was supposed to be as hidden as possible on the system, I'd write in Power Basic, whose binaries are very tight, allows in-line assembly, everything is done dynamically, generates binaries that only use the native Windows API's instead of run times even when you use it's most sophisticated built-in functions. We'd also encode all string constants (including the names of the API's we called) as well as do all API calls via LoadLibary, as part of the binary obfuscation. Last, just before putting together the installation package, every exe dll (all binaries) were run through a filter which specifically stripped out everything that identified the compiler and language. The end result was that examining the binary without disassembling made it impossible to tell what API's we were calling, what any string constants were, nor what compiler we used. With disassembling, you'd get the strings and API's, but not the compiler. (We also did other things seen in viruses to make the laptop tracking software hard to detect, and nearly impossible to remove once detected, etc.)

My point being that the binaries you are examining might have gone through some type of post compiler process to thwart attempts to backtrack its origin.

Edited by MJB, 2012 Mar 09, 19:32

Reply    

Zarck

2012 Mar 09, 20:11
0
 

Forth ?

In Forth it is possible to create its own instructions, to add them to the core, to recompile core, program the robots, etc... http://en.wikipedia.org/wiki/Forth_(programming_language)

Reply    

G

2012 Mar 09, 20:14
0
 

This is definitely brainfuck!

This is definitely brainfuck!
I just couldn't resist not to write this after reading all the comments. Guys here mentioned all the languages, I heard of.
Still I find this topic very interesting in terms, what language/tools were used there, Igor, please let us know the end of the story.

Again, sorry for the trolling :)

Reply    

Andreas Bogk

2012 Mar 09, 20:58
0
 

Re: Forth ?

Nope, not a forth. I can tell by the pixels.

Reply    

Andreas Bogk

2012 Mar 09, 20:59
0
 

Re: New Compiler

Manual memory management, no type tags: this is not a Lisp.

Reply    

Chiloane RK

2012 Mar 09, 21:46
0
 

NASM

en.m.wikipedia.org/wiki/Netwide_Assembler

Reply    

MMandrake

2012 Mar 09, 22:13
0
 

Re: Re: New Compiler

I had a HP48G (hewlett packard calculator) that used something similar for programming

Reply    

FatherStorm

2012 Mar 09, 23:55
0
 

Compile PHP?

could it be compiled PHP with (newish) Traits? that would explain a global $this even in attached sections.

Reply    

Igor Soumenkov

2012 Mar 10, 00:03
0
 

Re: Simple Object Orientation (for C)

SOO may be the correct answer! But there are still two things to figure out:
1) When was SOO C created? I see Oct 2010 in git - that's too late, Duqu was already out there.
2) If SOO is the toolkit, then event driven model was created by the authors of Duqu. Given the size of framework-based code, they should have spent 1+ year making all things work correctly.

Reply    

miki

2012 Mar 10, 00:08
0
 

Re: Re:

I think it's a Tcl (Incr Tcl). Its look's like assembler, but it is a object language.

Reply    

Igor Soumenkov

2012 Mar 10, 00:11
0
 

Re: Re: Other C/C++ compiler?

igorsk, thanks for the hint. It turns out that almost the same code can be produced by the MSVC compiler for a "hand-made" C class. This means that a custom OO C framework is the most probable answer to our question.
We kept this (OO C) version as a "worst-case" explanation - because that would mean that the amout of time and effort invested in development of the Framework is enormous compared to other languages/toolkits.

Reply    

miki

2012 Mar 10, 00:14
0
 

Tcl language

I think that is a Tcl or Incr Tcl language. It look's like assembler, but is objective.

Reply    

Painkiller

2012 Mar 10, 00:57
0
 

Maybe Pic language

It's look like Pic Language but it's strange cause this language is only for transistor programmation...

Reply    

aria.banacha

2012 Mar 10, 01:55
0
 

How about...

Haskell ?
Scala ?

Reply    

juryben

2012 Mar 10, 02:34
0
 

Wild guess

I'm gonna take a wild guess and say it was coded in C and compiled with the WDK in the IDE, Visual Studio. The author could have enable /FAs flag. Took the .ASM file(s) and recompiled the .ASM file(s) again with the WDK in Visual Studio.

And I can't see any author mixing and matching languages. I would say it's probably C, as that's just logical, and then converted to ASM.

Edited by juryben, 2012 Mar 10, 08:40

Reply    

bl00d

2012 Mar 10, 02:40
0
 

What about quantum leaps framework ?
Back one year ago, when I was searching for some RTOS, I thought this was a pretty cool, small and innovating OS based on events, state machines and ...UML. I don't know much about it but you should look into it.

Reply    

If you would like to comment on this article you must first
login


Bookmark and Share
Share

Analysis

Blog