English
The Internet threat alert status is currently normal. At present, no major epidemics or other serious incidents have been recorded by Kaspersky Lab’s monitoring service. Internet threat level: 1

The Mystery of the Duqu Framework

Igor Soumenkov
Kaspersky Lab Expert
Posted March 07, 15:58  GMT
Tags: Duqu
1.2
 

While analyzing the components of Duqu, we discovered an interesting anomaly in the main component that is responsible for its business logics, the Payload DLL. We would like to share our findings and ask for help identifying the code.

Code layout

At first glance, the Payload DLL looks like a regular Windows PE DLL file compiled with Microsoft Visual Studio 2008 (linker version 9.0). The entry point code is absolutely standard, and there is one function exported by ordinal number 1 that also looks like MSVC++. This function is called from the PNF DLL and it is actually the “main” function that implements all the logics of contacting C&C servers, receiving additional payload modules and executing them. The most interesting is how this logic was programmed and what tools were used.

The code section of the Payload DLL is common for a binary that was made from several pieces of code. It consists of “slices” of code that may have been initially compiled in separate object files before they were linked in a single DLL. Most of them can be found in any C++ program, like the Standard Template Library (STL) functions, run-time library functions and user-written code, except the biggest slice that contains most of C&C interaction code.


Layout of the code section of the Payload DLL file

This slice is different from others, because it was not compiled from C++ sources. It contains no references to any standard or user-written C++ functions, but is definitely object-oriented. We call it the Duqu Framework.

The Framework

Features

The code that implements the Duqu Framework has several distinctive properties:

  • Everything is wrapped into objects
  • Function table is placed directly into the class instance and can be modified after construction
  • There is no distinction between utility classes (linked lists, hashes) and user-written code
  • Objects communicate using method calls, deferred execution queues and event-driven callbacks
  • There are no references to run-time library functions, native Windows API is used instead

Objects

All objects are instances of some class, we identified 60 classes. Each object is constructed with a “constructor” function that allocates memory, fills in the function table and initializes members.


Constructor function for the linked list class.

The layout of each object depends on its class. Some classes appear to have binary compatible function tables but there is no indication that they have any common parent classes (like in other OO languages). Furthermore, the location of the function table is not fixed: some classes have it at offset 0 of the instance, but some does not.


Layout of the linked list object. First 10 fields are pointers to member functions.

Objects are destroyed by corresponding “destructor” functions. These functions usually destroy all objects referenced by member fields and free any memory used.

Member functions can be referenced by the object’s function table (like “virtual” functions in C++) or they can be called directly. In most object-oriented languages, member functions receive the “this” parameter that references the instance of the object, and there is a calling convention that defines the location of the parameter – either in a register, or in stack. This is not the case for the Duqu Framework classes – they can receive “this” parameter in any register or in stack.


Member function of the linked list, receives “this” parameter on stack

Event driven framework

The layout and implementation of objects in the Duqu Framework is definitely not native to C++ that was used to program the rest of the Trojan. There is an even more interesting feature of the framework that is used extensively throughout the whole code: it is event driven.

There are special objects that implement the event-driven model:

  • Event objects, based on native Windows API handles
  • Thread context objects that hold lists of events and deferred execution queues
  • Callback objects that are linked to events
  • Event monitors, created by each thread context for monitoring events and executing callback objects
  • Thread context storage manages the list of active threads and provides access to per-thread context objects

This event-driven model resembles Objective C and its message passing features, but the code does not have any direct references to the language, neither does it look like compiled with known Objective C compilers.


Event-driven model of the Duqu Framework

Every thread context object can start a “main loop” that looks for and processes new items in the lists. Most of the Duqu code follow the same principle: create an object, bind several callbacks to internal or external events and return. Callback handlers are then executed by the event monitor object that is created within each thread context.

Here is an example pseudocode for a socket object:

SocketObjectConstructor {
    NativeSocket = socket();
    SocketEvent = new MonitoredEvent(NativeSocket);
    SocketObjectCallback = new ObjectCallback(this, SocketEvent, OnCallbackFunc);
    connect(NativeSocket, ...);
}
OnCallbackFunc {
    switch(GetType(Event)) {
    case Connected: ...
    case ReadData: ...
...}
}

Conclusions

  • The Duqu Framework appears to have been written in an unknown programming language.
  • Unlike the rest of the Duqu body, it's not C++ and it's not compiled with Microsoft's Visual C++ 2008.
  • The highly event driven architecture points to code which was designed to be used in pretty much any kind of conditions, including asynchronous commutations.
  • Given the size of the Duqu project, it is possible that another team was responsible for the framework than the team which created the drivers and wrote the system infection and exploits.
  • The mysterious programming language is definitively NOT C++, Objective C, Java, Python, Ada, Lua and many other languages we have checked.
  • Compared to Stuxnet (entirely written in MSVC++), this is one of the defining particularities of the Duqu framework.

The Duqu Framework: What was that?

After having performed countless hours of analysis, we are 100% confident that the Duqu Framework was not programmed with Visual C++. It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language.

We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost. We are confident that with your help we can solve this deep mystery in the Duqu story.


161 comments

Oldest first
Threaded view
 

Bildos

2012 Mar 07, 22:11
0
 

Hi Igor,
Good to know that you still working on Duqu.
Good luck with investigation!

Reply    

As400tech

2012 Mar 09, 00:03
0
 

That code looks familiar

The code your referring to .. the unknown c++ looks like the older IBM compilers found in OS400 SYS38 and the oldest sys36.

The C++ code was used to write the tcp/ip stack for the operating system and all of the communications. The protocols used were the following x.21(async) all modes, Sync SDLC, x.25 Vbiss5 10 15 and 25. CICS. RSR232. This was a very small and powerful communications framework. The IBM system 36 had only 300MB hard drive and one megabyte of memory,the operating system came on diskettes.

This would be very useful in this virus. It can track and monitor all types of communications. It can connect to everything and anything.

Reply    

SCooke

2012 Mar 09, 15:54
0
 

Re: That code looks familiar

It's easier to figure this out if you consider vendor sourcing. The work was probably done by a government. And, whether the software was sourced through a US agency or whether a US agency itself was the creator, the net result is the same: you're looking for a major GSA-contracted firm who A) has clearance, B) has a compiler team, C) has a track record of providing similar product to the US government, and D) has a compiler codebase that looks kind of unfamiliar and not mainstream.

The likely suspects fitting that set of criteria are IBM, Microsoft, SAS and SAIC. All the others (remnant AT T, HP, remnant SGI... who am I forgetting?) incorporate a considerable amount of fairly recognizable shared compiler code in their offerings. Since you've disqualified Microsoft, my bet is on IBM.

I don't think it's SAS, because their compiler codebase is ancient. I don't think it's SAIC, because for them this would be a fairly difficult project. Three reasons why I think IBM.

First is that IBM has a library of bizarro options to select from. There's an internal HLASM-to-C frontend. There's all the CSet descendants. They've got research versions of damn near everything. (I'd try getting ahold of the ia32 version of CSet - probably hard to come by, but out there). They've also got a Windows source license, and if you were going to write a virus, that's always handy.

Second is that IBM has a history of doing projects like this. If there was a federal bid, they almost certainly would have been a bidder.

Third is that the project could have been run out of IBM Haifa. A number of the old IBM AV team probably either were there or ended up there, so it wouldn't be too far out of their wheelhouse. And if you wanted to build a state-sponsored virus, you'd almost certainly want to build it in a country who already has near-active hostilities with the intended target for the virus such that those acts of aggression don't become de facto acts of war for you.

If you want to dig into that, have someone from IBM wander through the employee-written and internal software libraries for all the preprocessor frontends for various languages and compiler backends that output to ia32. Probably none of that is inherently secret. I bet you'll find something that produces similar output.

Reply    

srhubb

2012 Mar 13, 01:29
0
 

Re: Re: That code looks familiar

You forgot one other source for the military and intelligence community. UNISYS, formerly Burroughs (who supplied a lot of the contracted personnel) and Univac (Sperry+Univac) who supplied the bulk of the hardware and development software, for decades to the military and intelligence communities within our government (includes NSA, CIA, Army, Navy, Air Force, IRS, SSA, etc.).

It may be a tool developed by Sperry or Unisys as well as the one's you've mentioned.

An Old Univacer,
Srhubb

Reply    

Igor Soumenkov

2012 Mar 09, 00:11
0
 

Re:

Thank you!

Reply    

miki

2012 Mar 10, 00:08
0
 

Re: Re:

I think it's a Tcl (Incr Tcl). Its look's like assembler, but it is a object language.

Reply    

Hans Adams

2012 Mar 13, 03:36
0
 

Re: Re: --- HLAs ---

Code generated looks like some comparable code of PDP10 generated by BLISS. Today's best approach might be the class of High Level Assemblers. Those were available for all major architectures, /36 to /390, PDP11 (C was a substitute of a former HLA!), PDP10, VAXen, 8086 (DeSmeth???), ...

Thirty (was:Twenty five) years ago I used one for the 8086 to implement device drivers. I had started with MASM, but task became to complex to handle it using common assemblers. I knew HLAs for "real computers", so I longed for something reasonable even for the 8086.

The objects themselves were implemented by complex macros, very similar to the early C++.

In times gone I saw a whole object oriented framework implemented in a kind of HLA for the 68k in a realtime application.

Hint: http://sourceforge.net/projects/hlav1/

best, adamsh

Edited by adamsh, 2012 Aug 13, 12:45

Reply    

http

2012 Mar 07, 22:48
0
 

Language ideas

If it's Microsoft specific, did you check if it's this new F# thing or maybe some compiled .NET with included CLR, like for a compact mobile device? I don't know how that would look on the assembler level, but it's worth checking.

Reply    

Igor Soumenkov

2012 Mar 09, 00:12
0
 

Re: Language ideas

It is definitely not a CLR based language. Native code only.

Reply    

GeralltF

2012 Mar 11, 07:50
0
 

Re: Microsoft based, native code only

What about the IL2CPU compiler developed by the Cosmos team?
Only problem is that the library compiles pure CIL, so no platform invokes. But their X# feature allows embedding raw X86 operations.

Reply    

Hesekiel

2012 Mar 07, 23:00
0
 

Its Iron Python

Made in Here

Reply    

Igor Soumenkov

2012 Mar 09, 00:13
0
 

Re: Its Iron Python

No traces of the .NET framework or JIT.

Reply    

Strangepork

2012 Mar 07, 23:43
0
 

Strange Guess

What about CPLEX LIB---> something that works with that.. C++ or Python, java, etc. I think Object C as well. (I think Iron Python does.) What about IBM-ILOG's Optimization Programming Language (OPL)

http://www-01.ibm.com/software/integration/optimization/cplex-optimization-studio/modeling/#libraries

http://www-01.ibm.com/software/websphere/images/OPL.jpg

IBM ILOG CPLEX is a tool for solving linear optimization problems, commonly referred to as Linear Programming (LP) problems, CPLEX also can solve several extensions to LP:
Network Flow problems, a special case of LP that CPLEX can solve much faster by exploiting the problem structure.

Interesting, good luck with investigation. Best of luck...

Reply    

wspibis

2012 Mar 08, 00:56
0
 

Guess

D? ( http://dlang.org/ )

Reply    

Igor Soumenkov

2012 Mar 09, 00:14
0
 

Re: Guess

We've tried D, too.

Reply    

jpoupard

2012 Mar 08, 01:09
1
 

WEB Guess

Actually, may look like libevent. It's a c written event base comunication lib.

Callback example fomr libevent:

static int
evhttp_method_may_have_body(enum evhttp_cmd_type type)
{
switch (type) {
case EVHTTP_REQ_POST:
case EVHTTP_REQ_PUT:
case EVHTTP_REQ_PATCH:
return 1;
case EVHTTP_REQ_TRACE:
return 0;
/* XXX May any of the below methods have a body? */
case EVHTTP_REQ_GET:
case EVHTTP_REQ_HEAD:
case EVHTTP_REQ_DELETE:
case EVHTTP_REQ_OPTIONS:
case EVHTTP_REQ_CONNECT:
return 0;
default:
return 0;
}
}

cheers,

Edited by jpoupard, 2012 Mar 08, 01:45

Reply    

Igor Soumenkov

2012 Mar 09, 00:40
0
 

Re: WEB Guess

The Duqu Framework shares many principles of libevent, but it is completely object-oriented, even all events and callbacks are wrapped in objects.
Some APIs that are called by the Duqu event monitor object are not present in sources of libevent.
Anyway, we should study the sources of libevent again, to be 100% sure. Thanks!

Reply    

eyenot

2012 Mar 08, 01:19
1
 

HLA

I was also about to post the same thing: High-Level Assembly. There is a really nice HLA tool from a French author that makes HLA very easy. Using its "macros" feature you could spend a little time and have a coherent framework for the rest of your entire code that would closely resemble "object oriented" programming, but would just be very well structured assembly. It went by the name of SPASM, "Specific Assembly". It is now RosASM, and it is very highly structured and automated. The full details can be found here http://en.wikipedia.org/wiki/User:B2kguga/RosAsm

Edited by eyenot, 2012 Mar 08, 01:30

Reply    

T,aliesin

2012 Mar 08, 01:27
1
 

Just a guess

Reading the description the thirst thing that pops to my mind is " Could it be Common LISP". Common Lisp includes CLOS, an object system that supports multimethods and method combinations. Classes are similar to structures, but offer more dynamic features and multiple-inheritance.Common Lisp supports first-class functions. For instance, it is possible to write functions that take other functions as arguments or return functions as well. This makes it possible to describe very general operations.

Reply    

ksb

2012 Mar 08, 01:37
0
 

It might be Forth.

I used to write compilers for a living. Any well structured Forth program looks a lot like that code. The author may have built words (old-school builds/does words) to make function entry/exit compatible with other calling conventions. I did.

I doubt it is hand-coded x86, since it is clearly stack-based at the core, and x86 programmers tend to
use more registers to pass parameters.

Reply    

slew

2012 Mar 08, 01:49
0
 

erlang?

The language you are describing sounds a lot like Erlang. Someone with a telecom background might want to write this type of code in Erlang. However, Erlang is not an object oriented language, but a functional language with message passing.

Reply    

D. Drummond

2012 Mar 08, 02:00
0
 

My first thought...

...was that it could be Vala (http://live.gnome.org/Vala), a high level object oriented language which is compiled to C, but there would surely be some link to GObject if it were.

Reply    

Igor Soumenkov

2012 Mar 09, 00:42
0
 

Re: My first thought...

We tried Vala, too. Unfortunately, the generated code is completely different.

Reply    

CZeng

2012 Mar 08, 02:55
0
 

Delphi?

*There are no references to run-time library functions, native Windows API is used instead.

Sounds a lot like Delphi where framework methods are simply mapped to the Windows API.

Reply    

AbsentMindedProfessor

2012 Mar 08, 03:54
0
 

Well Intended Comments

They are asking for someone whom has first hand knowledge on the subject.

An objective 'guess' could be limited to two or three probables, one they've mentioned (ie. custom built compiler).

It would be fair to say that the Duqu Framework 'Team' are advanced developers with significant resources at their disposal. It would also be fair to say without going into details, that anyone whom directly knows the answer would be at extreme personal risk if they were to answer it.

amp

Reply    

Julia

2012 Mar 08, 04:08
0
 

Close to Erlang?

I agree with slew it's very very similar to Erlang. But wouldn't Erlang would have separate functions for each callback?

Reply    

Wes Brown

2012 Mar 08, 06:49
0
 

Re: Close to Erlang?

Yep. Mosquito Lisp has characteristics of Erlang. See below comment.

Reply    

thebill

2012 Mar 08, 04:47
0
 

Guess: Ada tasks?

I haven't seen what it would look like in assembly, but the description calls to mind Ada tasks.

Reply    

thebill

2012 Mar 09, 07:04
0
 

Re: Guess: Ada tasks?

It's also interesting to note:
- The GNU Ada reference library (GNARL) has a function InitializeCriticalSection. Critical sections are commonly used in Ada tasks, such as to implement synchronization by monitors and semaphores.
- You can build Windows DLLs with the Ada GNAT compiler. See http://www.adacore.com/wp-content/files/auto_update/gnat-unw-docs/html/gnat_ugn_38.html.
- See later post: Object-oriented Ada does allow you to implement destructors for your objects:
http://en.wikibooks.org/wiki/Ada_Programming/Object_Orientation
- Ada is not used by many people, but is used widely in government and defense.

Reply    

Wes Brown

2012 Mar 08, 06:39
3
 

It's most probably Lisp, inspired by Mosquito Lisp

Howdy, y'all.

This is most probably Lisp. The distinctive features that you mention are very characteristic of a Lisp-based language. Prototype object systems are virtually indistinguishable from functional languages which implement object systems. Reading your description makes me suspect that they were inspired by a talk that I gave about research into injectable virtual machines using Lisp by Scott Dunlop and I.

Here's a link to the video presentation that I gave in Malaysia -- this was back in 2006.
http://video.google.com/videoplay?docid=-468113072359282746

You can find the slides for my talk here:
http://packetstormsecurity.org/files/50716/DAY_2_-_Wes_Brown_-_MOSREF.pdf.html

Our methodology used byte code, but there's no reason why such techniques could not apply to compiled objects.

-Wes

Reply    

M F

2012 Mar 08, 10:31
2
 

Re: It's most probably Lisp, inspired by Mosquito Lisp

I had similar code structure analysis and we couldn't come up with any programming languages...
Our final answers were this is a new and private programming language aka *Cyber Weaponry* framework which they have made or custom C-Like compiler/linker.

I suppose no one will never know what exactly this is, unless you join MOSSAD or something :-)

But what "Wes brown" trying to say is the most close one I heard since! I think he's right, its maybe Lisp.

Reply    

Igor Soumenkov

2012 Mar 09, 01:40
0
 

Re: It's most probably Lisp, inspired by Mosquito Lisp

Thank you Wes! Could you please suggest a Lisp implementation that we should check in the first place?

Reply    

clojuredev

2012 Mar 09, 04:37
0
 

Re: Re: It's most probably Lisp, inspired by Mosquito Lisp

It's probably the Ferret Lisp to C++ compiler ( http://nakkaya.com/2011/06/29/ferret-an-experimental-clojure-compiler/ )

"Ferret: An Experimental Clojure Compiler

Ferret is an experimental Lisp to C++ compiler, the idea was to compile code that is written in a very small subset of Clojure to be automatically translated to C++ so that I can program stuff in Clojure where JVM or any other Lisp dialect is not available. "

Reply    

Wes Brown

2012 Mar 09, 08:17
0
 

Re: Re: Re: It's most probably Lisp, inspired by Mosquito Lisp

Probably not. Duqu and Stuxnet components date to 2007, predating this. I would also point out that as of 2006, this particular technique of a virtual machine to evade detection and reverse engineering was known.

Reply    

Wes Brown

2012 Mar 09, 08:12
0
 

Re: Re: It's most probably Lisp, inspired by Mosquito Lisp

Igor,

Your first mistake is assuming that they are using an off the shelf compiler. Scott Dunlop wrote Mosquito Lisp, which is an entire virtual machine with a byte code language and a dialect of Scheme combined with Lisp in about nine months or so.

Someone who is smart and motivated, as the Duqu people were, could dedicate someone to writing a compiler in-house in the same timeframe, but targeted towards x86 object systems -- we could have done this, but we wanted to transmit byte code and be portable across multiple architectures. Different goals here. You also presume that they are using an off the shelf linker. Mosquito Lisp and Wasp Lisp append byte code to the end of the VM stub.

-Wes

Reply    

tomf

2012 Mar 13, 16:06
0
 

Re: Re: Re: It's most probably Lisp, inspired by Mosquito Lisp

Wes,

I have to agrre with you esp. rhe comment about Scheme. When I looked at the code snippets it definetly had the Scheme feel about it and was the first language that came to mind. When I was back in college we used that language as well as lisp. It has some pretty interesting capabilities, one of them was the way it handled class objects. All objects were treated as first class objects. Wish I still had a copy of it as it was preety good back in the day and was somewhat easier to use than lisp or forth.

tom

Reply    

Russell Emilio Burrows

2012 Mar 08, 11:08
0
 

modified report program generator ??

Back in 1982 we used RPG to create some test programs to switch on and off water valves with solenoids.

RPG using object oriented programming with a modified compiler can create something like this.

Never mind.

Reply    

T McGuire

2012 Mar 08, 11:51
1
 

RoseRT?

The odd handling of the "this" pointer and the message-passing architecture made me think of this: Rational Software published a "real-time" modelling package in 2000/2001 called RoseRT based off technology from ObjectTime Developer that had an underlying message-passing framework built-in. You would develop the high-level design in UML and then code-gen to a language like C or C++. For C code-gen, the object-structure was flattened so you could compile w/ a standard C compiler. RoseRT was used for secure govt projects...

Reply    

tortoiseDoc

2012 Mar 08, 12:36
0
 

Javascript?

Or some customized version of it. I have never seen compiled bytecode of Javascript, but given the fact that the classes seem to have modificable interfaces, the first thing that comes to mind is Javascript.

Reply    

mtn1980

2012 Mar 08, 16:05
0
 

Corman LISP

Looks like Corman Lisp to me. Encapsulated binary object code, use of MFC, dynamic objects, ambiguous calling convention.

Reply    

mrlozer

2012 Mar 08, 17:14
0
 

Just a quess

Well i am not a specialist but i would say it could be ruby or envelope programming language... It could also be agena or logtalk...

Reply    

flakfizer

2012 Mar 08, 18:07
0
 

Google's Go language?

Have you investigated if this could be Google's relatively new Go language? It compiles native, and it has a concurrent messaging infrastructure built into the language.

Reply    

Igor Soumenkov

2012 Mar 09, 00:45
0
 

Re: Google's Go language?

Go was one of the first languages to check. That's definitely not Go.

Reply    

marmite

2012 Mar 08, 19:55
0
 

Could it be Eiffel Case

I used a programming language at uni which was object orientated called Eiffel Case...Just a thought

Reply    

billappleton

2012 Mar 08, 20:31
0
 

maybe node.js

good for event-driven I/O

Reply    

Kenez

2012 Mar 08, 21:25
0
 

Unknown compiler, but what linker

It's okay that the compiler is not known. But what linker did they use? If they are linked together MSCPP then the other compiler should have generated MS compatible code. Those lisp, vala, erlang, forth compilers are able to generate that type of output?

I'd seconding to those who guessed it to assembly. Very nice assembly macros can be written with a good assembler, such as was the MASM from MS.

If we know the linker, then we will probably get closer to the questioned language...

Reply    

Kenez

2012 Mar 08, 21:37
0
 

Timeframe

Don't forget that researchers traced back the activity of this malware to August 2007. This is extraordinary in stealth...

Reply    

mrlozer

2012 Mar 08, 21:38
0
 

One more guess

It can also be Euphoria...

Reply    

mrlozer

2012 Mar 13, 17:47
0
 

Re: One more guess

Euphoria has this euphoria to c compiler http://www.rapideuphoria.com/e2c.htm

Reply    

SCARFP

2012 Mar 08, 22:05
0
 

Assembly Language?

Check out this site it looks like old assembly language:

http://www.swansontec.com/sprogram.html

Edited by SCARFP, 2012 Mar 08, 22:24

Reply    

This comment was deleted by Derek Jecxz, 2012 Mar 12, 09:20

infernalmachine

2012 Mar 12, 03:44
0
 

Re: May I ask...

Most likely the consistency and perhaps optimisations that would only be conceivably possible if done by machine along with a lack of optimisations that could only be conceivably made by hand. Hard to tell without looking at the full code but generally such a large amount of code done by hand will have hints (Edit: Actually, 95663 bytes is quite small, but it should be big enough still to offer good hints).

Also, coding it in assembly by hand would have many significant weaknesses. It's inconceivable someone would do that because the disadvantages of using pure assembly outweighs the advantages immensely.

Even if there coder were able to create it in pure assembly, it is highly likely they would create something such as a set of macros or their own basic higher level language. Most likely inspired by features encountered in other higher level languages. Doing everything in assembly is not a good thing. A human will usually work out something more efficient than copying and pasting the same thing hundreds of times and editing it slightly each time.

It might be a hand made language but tactically, it's worth investigating anyway. Precisely for the fact that it appears to be obscure. It could potentially give clues about the identity of the creators.

I suggest they release the DLL and commented disassembly so that it isn't such a shot in the dark.

Edited by infernalmachine, 2012 Mar 13, 02:50

Reply    

nbtaekbfgt

2012 Mar 08, 23:11
0
 

realbasic

Could it be realbasic?

Reply    

If you would like to comment on this article you must first
login


Bookmark and Share
Share

Analysis

Blog