English
The Internet threat alert status is currently normal. At present, no major epidemics or other serious incidents have been recorded by Kaspersky Lab’s monitoring service. Internet threat level: 1

The Mystery of the Duqu Framework

Igor Soumenkov
Kaspersky Lab Expert
Posted March 07, 15:58  GMT
Tags: Duqu
1.2
 

While analyzing the components of Duqu, we discovered an interesting anomaly in the main component that is responsible for its business logics, the Payload DLL. We would like to share our findings and ask for help identifying the code.

Code layout

At first glance, the Payload DLL looks like a regular Windows PE DLL file compiled with Microsoft Visual Studio 2008 (linker version 9.0). The entry point code is absolutely standard, and there is one function exported by ordinal number 1 that also looks like MSVC++. This function is called from the PNF DLL and it is actually the “main” function that implements all the logics of contacting C&C servers, receiving additional payload modules and executing them. The most interesting is how this logic was programmed and what tools were used.

The code section of the Payload DLL is common for a binary that was made from several pieces of code. It consists of “slices” of code that may have been initially compiled in separate object files before they were linked in a single DLL. Most of them can be found in any C++ program, like the Standard Template Library (STL) functions, run-time library functions and user-written code, except the biggest slice that contains most of C&C interaction code.


Layout of the code section of the Payload DLL file

This slice is different from others, because it was not compiled from C++ sources. It contains no references to any standard or user-written C++ functions, but is definitely object-oriented. We call it the Duqu Framework.

The Framework

Features

The code that implements the Duqu Framework has several distinctive properties:

  • Everything is wrapped into objects
  • Function table is placed directly into the class instance and can be modified after construction
  • There is no distinction between utility classes (linked lists, hashes) and user-written code
  • Objects communicate using method calls, deferred execution queues and event-driven callbacks
  • There are no references to run-time library functions, native Windows API is used instead

Objects

All objects are instances of some class, we identified 60 classes. Each object is constructed with a “constructor” function that allocates memory, fills in the function table and initializes members.


Constructor function for the linked list class.

The layout of each object depends on its class. Some classes appear to have binary compatible function tables but there is no indication that they have any common parent classes (like in other OO languages). Furthermore, the location of the function table is not fixed: some classes have it at offset 0 of the instance, but some does not.


Layout of the linked list object. First 10 fields are pointers to member functions.

Objects are destroyed by corresponding “destructor” functions. These functions usually destroy all objects referenced by member fields and free any memory used.

Member functions can be referenced by the object’s function table (like “virtual” functions in C++) or they can be called directly. In most object-oriented languages, member functions receive the “this” parameter that references the instance of the object, and there is a calling convention that defines the location of the parameter – either in a register, or in stack. This is not the case for the Duqu Framework classes – they can receive “this” parameter in any register or in stack.


Member function of the linked list, receives “this” parameter on stack

Event driven framework

The layout and implementation of objects in the Duqu Framework is definitely not native to C++ that was used to program the rest of the Trojan. There is an even more interesting feature of the framework that is used extensively throughout the whole code: it is event driven.

There are special objects that implement the event-driven model:

  • Event objects, based on native Windows API handles
  • Thread context objects that hold lists of events and deferred execution queues
  • Callback objects that are linked to events
  • Event monitors, created by each thread context for monitoring events and executing callback objects
  • Thread context storage manages the list of active threads and provides access to per-thread context objects

This event-driven model resembles Objective C and its message passing features, but the code does not have any direct references to the language, neither does it look like compiled with known Objective C compilers.


Event-driven model of the Duqu Framework

Every thread context object can start a “main loop” that looks for and processes new items in the lists. Most of the Duqu code follow the same principle: create an object, bind several callbacks to internal or external events and return. Callback handlers are then executed by the event monitor object that is created within each thread context.

Here is an example pseudocode for a socket object:

SocketObjectConstructor {
    NativeSocket = socket();
    SocketEvent = new MonitoredEvent(NativeSocket);
    SocketObjectCallback = new ObjectCallback(this, SocketEvent, OnCallbackFunc);
    connect(NativeSocket, ...);
}
OnCallbackFunc {
    switch(GetType(Event)) {
    case Connected: ...
    case ReadData: ...
...}
}

Conclusions

  • The Duqu Framework appears to have been written in an unknown programming language.
  • Unlike the rest of the Duqu body, it's not C++ and it's not compiled with Microsoft's Visual C++ 2008.
  • The highly event driven architecture points to code which was designed to be used in pretty much any kind of conditions, including asynchronous commutations.
  • Given the size of the Duqu project, it is possible that another team was responsible for the framework than the team which created the drivers and wrote the system infection and exploits.
  • The mysterious programming language is definitively NOT C++, Objective C, Java, Python, Ada, Lua and many other languages we have checked.
  • Compared to Stuxnet (entirely written in MSVC++), this is one of the defining particularities of the Duqu framework.

The Duqu Framework: What was that?

After having performed countless hours of analysis, we are 100% confident that the Duqu Framework was not programmed with Visual C++. It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language.

We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost. We are confident that with your help we can solve this deep mystery in the Duqu story.


161 comments

Oldest first
Table view
 

Hans Adams

2012 Mar 13, 03:36
0
 

Re: Re: --- HLAs ---

Code generated looks like some comparable code of PDP10 generated by BLISS. Today's best approach might be the class of High Level Assemblers. Those were available for all major architectures, /36 to /390, PDP11 (C was a substitute of a former HLA!), PDP10, VAXen, 8086 (DeSmeth???), ...

Thirty (was:Twenty five) years ago I used one for the 8086 to implement device drivers. I had started with MASM, but task became to complex to handle it using common assemblers. I knew HLAs for "real computers", so I longed for something reasonable even for the 8086.

The objects themselves were implemented by complex macros, very similar to the early C++.

In times gone I saw a whole object oriented framework implemented in a kind of HLA for the 68k in a realtime application.

Hint: http://sourceforge.net/projects/hlav1/

best, adamsh

Edited by adamsh, 2012 Aug 13, 12:45

Reply    

Mark838

2012 Mar 13, 05:22
0
 

its not Eiffel

Hi IGOR

Very interesting reading, our team here did look at this too, first some people suggested its Eiffel... It's not 99% we can tell!
We rather agree that this is custom oo C framework... we would put our bets on this! Good Luck!

Reply    

typsy

2012 Mar 13, 12:40
0
 

Google's Dart

Coding looks very familiar. Just giving another option here. If anyone has time to take a look at google's Dart language which is similar to Javascript.

http://dartr.com/

Reply    

diskjunky

2012 Mar 13, 14:07
0
 

feasability of automated check

I've read through all the material given and the comments and while there are some interesting avenues of investigation, the general concensus is one you've already come to - that this is a custom OO framework. This concensus is one given by many skilled individuals who are familiar with decompiled code.

While I don't now how feasable this is, may I suggest a different and wider scoped approach? If a simple tool could be provided that automatically checked chosen folders for PE files on a user's machine for the code signature you're trying to identify, then you could post requests on hobbyist and professional programming boards, news forums etc. There are far more people out there willing to try and help than have the skills to delve into and identify decompiled code. Providing them with a tool to check their existing code bases will give you a much wider reach and would at the very least rule out some possibilities that you haven't considered yet.

Hobbyists and hobbyist programmers in particular (especially older ones) will have come into contact with a very large and varied code base. Asking them to check it for a matching signature would well raise a few flags and give directions for identifying the mystery code.

Just a thought. I've no idea how viable however.

Reply    

david heath

2012 Mar 13, 14:34
0
 

looking wider

having seen just about every programming environment suggested, I'd like to suggest a somewhat tangential environment.

We've seen the on-again-off-again-on-again suggestion that Duqu is related to Stuxnet (I'm still not convinced they're related, by that's by-the-by) and with that in mind, has anyone compared the code with the product of any of the SCADA / PLC development environments? Perhaps this is a payload directed at a similar target to that targeted by Stuxnet (North Korea, anyone?)

Reply    

tomf

2012 Mar 13, 16:06
0
 

Re: Re: Re: It's most probably Lisp, inspired by Mosquito Lisp

Wes,

I have to agrre with you esp. rhe comment about Scheme. When I looked at the code snippets it definetly had the Scheme feel about it and was the first language that came to mind. When I was back in college we used that language as well as lisp. It has some pretty interesting capabilities, one of them was the way it handled class objects. All objects were treated as first class objects. Wish I still had a copy of it as it was preety good back in the day and was somewhat easier to use than lisp or forth.

tom

Reply    

tomf

2012 Mar 13, 16:47
0
 

It still looks like a dialect of Scheme

Igor,

Back in the day I had both a Dos and a windows version of Scheme. I used to look at the code after it was compiled and experiment with it. It treated all objects as first class which seems to be what it is doing after looking at the snippets. When I first looked at the code, Scheme is the first language that came to mind as it had that feel and look of old. David Heath also had a good idea about some of the SCADA/PLC development tools. One of them to look at would be Siemens Step 5/7 dvelopment environments which allows the use of statement logic and compiles the result into segments. I haven't explored it yet but I just might. I do like Wes's idea that it may be a Scheme derivitive as the feel is there.

tom

Reply    

mrlozer

2012 Mar 13, 17:47
0
 

Re: One more guess

Euphoria has this euphoria to c compiler http://www.rapideuphoria.com/e2c.htm

Reply    

jonwil

2012 Mar 13, 18:07
0
 

Re: looking wider

Its making Windows API calls directly so it has to be a Windows toolkit and not something for SCADA/PLC

Reply    

phlampe

2012 Mar 13, 18:12
0
 

Obfuscated ASM ?

It's been a very long time (like 15-20 years) since I wrote any C or C++, but I remember from those times contests of obfuscated C where the game was to write the most obscure piece of code possible (that would actually do something). I'm not in that kind of activity anymore, so I apologize if I'm stating something really obvious to this community.

It seems it's possible on ASM ( http://en.wikibooks.org/wiki/X86_Disassembly/Code_Obfuscation ): is it something that could have been done on that piece of code ? Or maybe did you already strip those layers of obfuscation ?

Regards,
Paul-Henri

Reply    

jonwil

2012 Mar 13, 18:15
0
 

The facts

1.Its calling Windows APIs so it has to be a Windows compiler (and not one for PLCs or SCADA or mainframes or anything else)
2.It was linked with Visual C++ code therefore it has to be something that emits native x86 Visual C++ compatible object files (rules out Delphi, any of the .NET languages, Java, Javascript, PHP, Python, Perl, Ruby, Visual Basic, VBA, VBScript and others)

Depending on what the disassembly of the memory allocate/free functions, I am going to concur with others here and say "yes this is a fake-OO framework for C (it might have even been compiled with a C++ compiler and not a C compiler)

Reply    

Woob

2012 Mar 13, 18:42
0
 

PL/1,PL/S II,PL/AS,PL/X

Many IBM operating systems are written in these languages. As previous users mentioned the similarities to code they had seen on IBM systems i figured it might be worth while to take a look at these. IBM has a compiler for PL/1 that runs on windows. The other languages are derived from PL/1 and are mostly used interally on IBM so i wouldn't be a long shot to assume that they might have compilers for windows for those to. I should also add that this is all speculation on my part. i don't have the knowledge required to properly read and analyze the code posted.

Reply    

diskjunky

2012 Mar 13, 20:19
0
 

Re: Obfuscated ASM ?

it's possible but looking at the naming conventions used in the disassembly, it looks more like a dedicated tool created the payload from an OO based language structure. Of course, there's nothing stopping someone creating a tool to deliberately obfuscate code to make it look like OO but that's an order of magnetude more difficult than creating a straight compiler. And if you're going to the trouble of creating and maintaining a custom compiler, you're going to keep it pretty simple - which is probably why it looks 'old'. Older systems were a little more direct in their compiled code. This being an event-driven architecture and therefore capable of being used for multi and single-threaded applications (not quite but bear with me), a lot of effect went into making it. If it was a custom compiler and deliberately obfuscating code, it'd make it very hard to maintain and debug. The level of sophestication of the existing stuxnet and duqu code and their use of the VS C++ library suggests they were using an off-the-shelf compiler albeit an obscure one.

Following ocham's razor; "The most simple explanation is usually the correct one" (actually it's not in all circumstantes but I digress), the 'obvious' answer is that a tool was used to compile from a standard OO language or variant thereof

Reply    

diskjunky

2012 Mar 13, 20:33
0
 

Re: The facts

There are various ways of adding runnable code to a PE file, ranging from linking at compile time to embedding as a runnable resource to even injecting code (think buffer overflow security exploits). One does not necessarily need to link to a compilable resource - although it's probably one of the easier ways.

Given the install base of SCADA systems, any runnable file would have to assume that all necessary runtime libraries were not available and must be included somewhere in the running PE file. This rules out all interpreted languages (non-natively compiled java, basic, etc), any language requiring an external runtime (all CLR based languages, VB 5/6, etc), to name but a few. Some languages allow native compilation, eg, delphi and java but they have already been investigated, or so I understand.

Reply    

dooqoo

2012 Mar 14, 00:49
0
 

Re: Re: Simple Object Orientation (for C)

I see a SourceForge project for SOOC which dates back at least 5 years http://sourceforge.net/projects/sooc/

Reply    

Shalogrim

2012 Mar 14, 00:56
0
 

Re:

I stumbled upon this site: http://autodiff.piotrbania.com /get_function_listing.php?diff_id=84 module_id=167 np_module_id=168 function_rva=0x0001f9b8 os=1#

What is AutoDiff?

AutoDiff is a project which performs automated binary differential analysis between two executable files. This is especially useful for reverse engineering vulnerability patches and spotting other additional code updates. AutoDiff allows to find executable code similarities and differences among two executable files. Additionally it also includes some heuristics methods for matching variables (objects) between two executable files. AutoDiff is ultra fast, standalone tool. It was especially designed to diff Portable Executable files released by Microsoft every time in the security bulletin.

More about the AutoDiff story:

http://blog.piotrbania.com/2010/12/rebootless-windows-updates-ksplice-for.html

That´s my contribution for the possibility list.

Reply    

dcedilotte

2012 Mar 14, 01:55
0
 

Could be one of these.

Could it be made in L (http://www.bitmover.com/lm/L/L.html)
Or in Ceylon
Or in Rust (by Mozilla).

Reply    

M-Boy

2012 Mar 14, 05:34
0
 

Brainstorming

For some reason - call it a hunch - I delved into the AI related programming languages after seeing the output.

First up was Lisp, specifically Common Lisp, but it seems it has been mentioned plentiful already (Under the assumption that dialects like Scheme and Clojure also has been tested), maybe I am not to far off?

I lack deeper programming knowledge, but other AI programming languages like Strips, Planner and Prolog seems fundamentally to different to logically produce the same result. But then we have IPL, not that distant and there is the IBM connection with the IPL-V. But it feels way to legacy to be used today?

Then it felt like I had seen this code already, sometime, somewhere. And the only thing I could think was FORTRAN - From my mothers studies way back. And considering the prior mentioned dates, Fortran 2003 could be of interest. Also considering the fact that mixing C++ and Fortran is not to unheard of.

Edit:
Thought while trying to sleep - TCL?

Edited by M-Boy, 2012 Mar 14, 06:32

Reply    

spikeysnack

2012 Mar 14, 11:02
0
 

I typed in "; lpMem" into google

and was then in a hunt for a win32 compiler ==>PowerBasic.

http://www.powerbasic.com/support/help/pbcc/index.htm#protected_mode_programming.htm

looks to be very nearly it. they have a set of objects including LinkedList with seemingly the same api names as above.
looking around their site it seems to be a serious programming platform for heavy win32 COM , complete with inline assembler.

they have some of the following idioms:

CLASS MyClass
INSTANCE MyVar AS LONG

CLASS METHOD CREATE()
' Do initialization
END METHOD

CLASS METHOD Destroy()
' Do cleanup
END METHOD

INTERFACE MyInterface
INHERIT IUNKNOWN
METHOD MyMethod()
' Do things
END METHOD
END INTERFACE
END CLASS

------------------------------------------------
EnterCriticalSection ByVal VarPtr(dStatus())
For i = 0 To UBound(gSoundStatus)
.... do stuff to data members of array
Next
LeaveCriticalSection ByVal VarPtr(dStatus())
------------------------------------------------

#COMPILE DLL "EvServer.dll"

$EvIFaceGuid = GUID$("{00000098-0000-0000-0000-000000000002}")
$MyClassGuid = GUID$("{00000098-0000-0000-0000-000000000003}")
$MyIFaceGuid = GUID$("{00000098-0000-0000-0000-000000000004}")

INTERFACE Status $EvIFaceGuid AS EVENT
INHERIT IUNKNOWN
METHOD Done
END INTERFACE

CLASS MyClass $MyClassGuid AS COM
INTERFACE MyMath $MyIFaceGuid
INHERIT IUNKNOWN
METHOD DoMath
MSGBOX "Calculating..." ' Do some math calculations here
RAISEEVENT Status.Done()
END METHOD
END INTERFACE

EVENT SOURCE Status

END CLASS
------------------------------------------------

This or something similar -- many scientists use this kind of interface for programming experimental machines and automating proceses. Thinking like a sneakypants, it would be a good way to take advantage of COM/win32 api 0-day exploits. TTF engine? .doc files -- tres faux paus!

Reply    

DarkArchon

2012 Mar 14, 14:21
0
 

OOOAC ?

Is the C framework could be an extention of the "OOOAC" homemade framework ? It implement class system inheritance and event management in ansi c language.

Reply    

n

2012 Mar 14, 17:48
0
 

>they can receive “this” parameter in any register or in stack.
aggressive Whole Program Optimization?

I suggest to check those languages.
* XPCOM API -> pseudocode looks like this. but ABI is not standard xptcall.
* .Net (compiled to native code w/ Mono, LLVM CIL or GCC CIL back-end)
* Scheme-variant
* Haskell (GHC) -> it's not OO, so maybe not.
* OpenCOBOL -> IIRC COBOL 2002 has OO feature.
* Go Programming Language
* D Programming Language
* Vala

Reply    

2esoskwahom4

2012 Mar 14, 17:52
0
 

sniffing from wrong direction, what does history tell you?

both As400tech and SCooke handed you the best hints.

A few years back I worked at East Fishkill long enough to meet eggs rubbing elbows with the 'black' GSA guys working down in Endicott and Watson (mostly the latter). The big topic at the time was exhorbitantly hi-priced memory being frantically consumed (we knew it was NSA, we realized later for upgrading Echelon to make it's data more transparent for future TIA transactions) post-911.

A cyberop like this would inevitably end up at big blooze' shop for the reasons scooke mentions: NOTHING gets thrown away by Endicott's hacks (a somewhat frustrating problem for workers needing access to boxes), their library of tools is as incomprehensibly massive as it is old. Indeed, Watson has not infrequently sent researchers there first to get their feet wet.

This probably initiated at Watson under NSA aegis, followed by research of tools at Endicott's library, then a handover to Haifa after payload completion. It's unrecognizable because NSA would demand that; any self-respecting beemer hack would know to hit up Endicott's libraries to make it so.

That said, it might be a little naive thinking any ibm'er you ask is gonna be successful convincing one of the mustier Endicott hacks to pony up from their libraries. scooke is right none of it is officially secret - but it frequently is VERY proprietary for some of them. A handful of old Endicott hacks still spend more time there than at home. That should tell you something about their priorities. It's all who you know. 'n no, I don't.

Reply    

andydude

2012 Mar 14, 22:55
0
 

haXe

Have you eliminated haXe from your list of options?

Reply    

Igor Soumenkov

2012 Mar 15, 01:25
0
 

Re: OOOAC ?

The type system and the code look completely different to the one in Duqu..

Reply    

Igor Soumenkov

2012 Mar 15, 01:26
0
 

Re: haXe

Yes, we checked haXe, too.

Reply    

david heath

2012 Mar 15, 13:42
0
 

Re: Re: looking wider

but many SCADAs (most?) run under Windows.

PLCs on the other hand don't have Windows anywhere near them.

Reply    

rt15

2012 Mar 15, 14:54
0
 

Hand written asm

Like eyenot, I think that it is hand written asm.

That would explain the different locations of the function table and the different ways the "this" pointer is passed. Human "mistakes".

(About mistake, should not be DeleteCriticalSection called instead of InitializeCriticalSection in the destructor ?)

That would also explain the non-usage of C runtime while it is painful in some cases (For example, whereas CopyMemory is "part" of Win32, it is not an actual function exported by a win32 dll. You have to code it again or use msvcrt.dll memcpy or another implementation).

Using asm + Win32 would not be to strange for people searching for vulnerabilities in a TrueType font parsing engine.

Putting function tables in instance is a naive and easy way of doing object oriented programming. Even a basic OO framework would have put function tables in a separate place with other class stuff (Static fields...) and would have put a pointer on this in the instance.

The framework looks like HLA standard library or ObjAsm32. But it is none of these two.

Anyway, my main point is that disassembly appears very clean and simple to me. Something pretty hard to obtain with traditional compilers which are often adding some weird stuff (Less clear instructions doing the same thing, particular stack frames, alignment stuff, strange instructions order, "mov edi, edi"...)

Reply    

david heath

2012 Mar 15, 17:23
0
 

An interesting aside

Of some kind of passing interest is the comment from Ken of Caffeine Security suggesting some level of similarity between Duqu and the recent Linux malware Linux/Bckdr-RKC. he claims to have sent material to Kasperski, but it may have fallen through the cracks.

http://caffeinesecurity.blogspot.com.au/2012/03/linuxbckdr-rkc-and-duqu-links-food-for.html

Reply    

Kochise

2012 Mar 15, 18:25
0
 

Check-out older possibilities

http://www.sics.se/~adam/lwip/ : IP stack
http://openthreads.sourceforge.net/ : threading framework
http://directory.fsf.org/wiki/Lightweight_C++ : intermediate language (get the code using webarchive)
http://bellard.org/tcc/ : low-level compiler

Perhaps that's the toolchain used...

Reply    

rt15

2012 Mar 15, 18:49
0
 

Re: An interesting aside

Ken just seems to have the feeling that there are similarities. And I really don't find anything obvious.

On one side:
Win32 OO programming.
On the other side:
Procedural programming with C runtime.

Reply    

If you would like to comment on this article you must first
login


Bookmark and Share
Share

Analysis

Blog