English
The Internet threat alert status is currently normal. At present, no major epidemics or other serious incidents have been recorded by Kaspersky Lab’s monitoring service. Internet threat level: 1

The mystery of Duqu Framework solved

Igor Soumenkov
Kaspersky Lab Expert
Posted March 19, 13:42  GMT
Tags: Duqu
0.6
 

The Quest for Identification

In my previous blogpost about the Duqu Framework, I described one of the biggest remaining mysteries about Duqu – the oddities of the C&C communications module which appears to have been written in a different language than the rest of the Duqu code. As technical experts, we found this question very interesting and puzzling and we wanted to share it with the community.

The feedback we received exceeded our wildest expectations. We got more than 200 comments and 60+ e-mail messages with suggestions about possible languages and frameworks that could have been used for generating the Duqu Framework code. We would like to say a big ‘Thank you!’ to everyone who participated in this quest to help us identify the mysterious code.

Let us review the most popular suggestions we got from you:

  • Variants of LISP
  • Forth
  • Erlang
  • Google Go
  • Delphi
  • OO C
  • Old compilers for C++ and other languages

Thanks to some very useful and knowledgeable comments, we can now say with a high degree of certainty that we have found the correct answer. I would like to quote the most relevant comments which helped us solve the puzzle:

igorsk
Simple Object Orientation (for C)

It seems someone over at reddit (http://www.reddit.com/r/ReverseEngineering/) hit the jackpot: the code snippets look _very_ similar to what this would produce: http://daifukkat.su/wiki/index.php/SOO
There are a few other OO frameworks for C, but they don't match as well: http://ooc-coding.sourceforge.net/ http://sooc.sourceforge.net/

Jonwil
Re: Other C/C++ compiler?

I have seen how GCC works internally and its ABI (for a number of different versions) and I can confirm that the Duqu code is definitely not generated by GCC. I don’t know how other C++ compilers work but the things I see in the ASM (like where the pointers to the functions go, the way the "this" pointer is passed etc) do not suggest C++ to me but something else entirely. (such as the aforementioned "object-oriented" frameworks for C that exist)

igorsk
Re: Other C/C++ compiler?

I’m 99% sure the machine code was generated by MSVC. It’s something you get a feel with experience, but I can point out two things that are quite characteristic of MSVC: 1) it uses esi as the first candidate for temporary storage; 2) “pop ecx” instead of “add esp, 4”.

We also received two very interesting e-mail messages. Pascal Bertrand aka bps and another author who preferred to remain anonymous suggested that the code was generated from a custom object-oriented C dialect, generally called “OO C”.

The comments were very important because they allowed us to track the exact compiler used in the project: the Microsoft Visual Studio compiler. I spent more time experimenting with different versions of MSVC compilers and different source codes and compiling options trying to reproduce the binary code of the constructor function mentioned in the previous blogpost and finally succeeded.


Disassembly of the original Duqu code: construction of the linked list class


Manually decompiled C code that produces the same code

The above C code, when compiled with MSVC 2008 and options /O1 (minimize size) /Ob1 (expand only __inline) produces the opcodes identical with the ones in the Duqu binary. Changing the order of operations and if/else blocks modifies the resulting code; MSVC 2005 compiler produces slightly different code, too. So, we can say with a high degree of certainty that the resulting binary was compiled with MSVC 2008 and options /O1 /Ob1 and the input source code was pure C.

So, what does that mean? In short, there are two very probable answers to our initial question:

  1. The code was written using a custom OO C framework, based on macros or custom preprocessor directives. This was suggested by your comments, because it is the most common way to combine object-oriented programming with C.
  2. All the code was written in OO C manually, without any extensions to the language. We can’t deny this possibility completely because, technically, it is near impossible to distinguish code written with macro directives from manually copy-pasted code.

Judging by the amount of similar-looking code in every constructor function and member functions, we can assume that source code preprocessing was used and variant 1 is closer to the truth.

Now, there are several open-source “OO C” frameworks available, and some of them produce code constructions that are very similar to those in the Duqu code. The best match we found is SOO (Simple Object Orientation for C), however it could not have been used in Duqu, because it was only published when the Trojan was already in the wild.

No matter which of these two variants is true, the implications are impressive. The Payload DLL contains 95 Kbytes of event-driven code written with OO C, a language that has no automatic memory management or safe pointers. This kind of programming is more commonly found in complex ‘civil’ software projects, rather than contemporary malware. Additionally, the whole event-driven architecture must have been developed as a part of the Duqu code or its OOC extension.

There is no easy explanation why OO C was used instead of C++, however, we have seen similar cases in the past. Having spoken to some of the people who prefer such techniques, they gave two main reasons for it:

  1. They don’t trust C++ compilers; these are usually people who started programming in the old days, when assembler was the top choice. C was a direct evolutionary step over assembler and quickly became a standard. When C++ was published, many old school programmers preferred to stay away from it because of distrust in memory allocation and other obscure language features which cause indirect execution of code (for instance, constructors).
  2. Extreme portability. Once again, in the old days (10-12 years ago) C++ was not entirely standardized and it was possible to have C++ code that would compile with MSVC but would not compile with (say) Watcom C++. If you wanted to go for extreme portability and target every existing platform out there, you’d go with C.

Both reasons appear indicate the code was written by a team of experienced, “old-school” developers.

Conclusions

  • The Duqu Framework consists of “C” code compiled with MSVC 2008 using the special options “/O1” and “/Ob1”
  • The code was most likely written with a custom extension to C, generally called “OO C”
  • The event-driven architecture was developed as a part of the Duqu Framework or its OO C extension
  • The C&C code could have been reused from an already existing software project and integrated into the Duqu trojan

All the conclusions above indicate a rather professional team of developers, which appear to be reusing older code written by top “old school” developers. Such techniques are normally seen in professional software and almost never in today’s malware. Once again, these indicate that Duqu, just like Stuxnet, is a “one of a kind” piece of malware which stands out like a gem from the large mass of “dumb” malicious program we normally see.


20 comments

Newest first
Threaded view
 

alifer

2012 Mar 21, 23:20
0
 

Open sourced?

"it could not have been used in Duqu, because it was only published when the Trojan was already in the wild"

So was Duqu opensourced as SOO after the Trojan was already in the wild? ;-)

Reply    

Abe Froman

2012 Mar 22, 02:23
0
 

Re: Open sourced?

Right tree, wrong apple. All I'm going to say is its highly modifiable, elegant, collaborative, and insanely lean code. It's also "interchangeable."

Reply    

Abe Froman

2012 Mar 21, 15:26
0
 

I figured it out.

I know what this was written in (Hint: not what you think), I know how the projects are worked on and how research is conducted, and I know that it is all very much in plain sight. What I do not know... is whether to release my findings. Aaaand not to toot my own horn, but for someone who doesn't even have a tech background, I found an absurd amount of painfully obvious correlations that led me to the answer in little to no time... most of which were passed right over in public analyses made available thus far, including your own.

Reply    

rt15

2012 Mar 23, 12:38
0
 

Re: I figured it out.

How could you know in what this was written in without technical background? Sorry but I think you are making bad assumption. For example, in a previous message, you are talking about “Obfuscated x86 Executables”. For what reasons are you thinking some part of duqu are obfuscated? The code above is clearly NOT obfuscated. It is very clean assembly. No tries to lose the reader. Easy reverse engineering (At least they would have not deactivated implicit inlining).
Then, you will tell us what? That the producer is a government? Microsoft? An anti-virus company?
I think you have no proof and that you are only making deductions and interpretations like millions guys on the internet.

Reply    

Abe Froman

2012 Mar 21, 05:42
0
 

"Analyzing Memory Accesses in Obfuscated x86 Executables" ...there's some of those in DuQu

Abstract

Programmers obfuscate their code to defeat manual or automated analysis. Obfuscations are often used to hide malicious behavior. In particular, malicious programs employ obfuscations of stack-based instructions, such as call and return instructions, to prevent an analyzer from determining which system functions it calls. Instead of using these instructions directly, a combination of other instructions, such as PUSH and POP, are used to achieve the same semantics. This paper presents an abstract interpretation based analysis to detect obfuscation of stack instructions. The approach combines Reps and Balakrishnan’s value set analysis (VSA) and Lakhotia and Kumar’s Abstract Stack Graph, to create an analyzer that can track stack manipulations where the stack pointer may be saved and restored in memory or registers. The analysis technique may be used to determine obfuscated calls made by a program, an important first step in detecting malicious behavior.

Article citation can be found at - https://dl.acm.org/citation.cfm?id=2144850 CFID=71449468 CFTOKEN=43117639

Research <3.

Reply    

Abe Froman

2012 Mar 21, 04:58
0
 

SoBig was written in a Windows assembly language...

And that wreaked havoc on that poor train company back in 2003, having its way with all of its data integrated systems. Trains... trains... seems like there's been a lot of problems with trains digital computers as of late. Now if you were capable of making this, and wanted to possess the nuclear option for any industry... Well, I choose freight.

Reply    

spikeysnack

2012 Mar 20, 16:17
0
 

mad genius?

perhaps a creation of a 'frustrated' genius (see David Stes, Hans Reiser... ) that never was adopted by above-board companies, but turns up on the dark nets.

Reply    

Pieter

2012 Mar 20, 15:46
0
 

Looking too far?

Great work! I think what you found does not mean the programmers actually wrote the Duqu framework for their trojan. It might be that it was developed earlier for a totally different purpose. The programmers making the Trojan might have just reused it as fit for purpose.

Reply    

Alex Marshall

2012 Mar 20, 01:32
1
 

Weird

I guess the DuQu authors had the same Obj-C inspiration as myself when I wrote SOO. Wonder how we ended up writing such similar code though. I guess there's only a couple ways to do certain things though.

Reply    

Bildos

2012 Mar 20, 01:18
0
 

Time to discover next project form stuxnet / duqu team ;-)

Hi Igor,

An news about new samples or new malware from stuxnet/duqu team ?

BR
Bildos

Reply    

Costin Raiu

2012 Mar 20, 17:43
0
 

Re: Time to discover next project form stuxnet / duqu team ;-)

Yes - check this out -> http://threatpost.com/en_us/blogs/newly-compiled-driver-shows-duqu-authors-still-work-032012

Reply    

Khalegh

2012 Mar 19, 19:13
1
 

Representation Based Support OO

In point of my view this framework is not developed directly in known programming language like C++, Assembly, etc. I think the core of Duqu framework, is a pre-saved representation of other, for example my picture is my representation (per time/location) and this picture can talk about me. I think the Duqu developers flow this strategy, of curse, in practical and deep view, the last report show Duqu has a different behavior per victim. In this strategy there is any way or hard to drive the original programming language of Framework was developed.I hope for Igor and other Friends Analyze. Good Luck Igor! :)

Regards
Khalegh Salehi
RCISS- Tabriz University

Reply    

dabenavidesd

2012 Mar 19, 19:00
0
 

Thought it was impossible to know with straight forward technique

First there isn't possibility of asking which code produced original source, but then you could narrowly think is C* but it isn't quite like that, it may be a trick that you are falling on.
Second the possibility of knowing which language isn't or which you think from it is produced makes no sense, since you can't know the language it was written originally.
So, my believe is that it isn't any C related thing, but a more useful language, if I may say so, why there was a mystery here, in the sense of compactness of the code and the target architecture, might be the best answer you can give. I know some people has measured that thing in several studies

Reply    

Andreas Bogk

2012 Mar 19, 18:01
0
 

Mhh.

Just a tad bit disappointed I didn't get a mention here: I suggested OO C about three comments before igorsk.

Reply    

Igor Soumenkov

2012 Mar 19, 21:39
0
 

Re: Mhh.

Sorry Andreas, didn't want to offend you. You did mention C with preprocessing and all your comments were helpful, too!

Reply    

Ollie

2012 Mar 19, 17:54
0
 

If it's MSVC is the RICH header present in the binary?

If it's compiled with MSVC is the 'Rich' header present?

http://ntcore.com/files/richsign.htm
http://web17.webbpro.de/index.php?page=microsofts-rich-header
http://www.woodmann.net/forum/attachment.php?attachmentid=1050

Cheers

Ollie
----
http://www.recx.co.uk/

Reply    

Igor Soumenkov

2012 Mar 19, 21:44
0
 

Re: If it's MSVC is the RICH header present in the binary?

Yes there is the 'Rich' mark in the header.

Reply    

Ollie

2012 Mar 20, 11:13
1
 

Re: Re: If it's MSVC is the RICH header present in the binary?

If you go through the version mapping exercise this is another way to see the exact version of the compiler that was used from the un XORd version numbers.

Reply    

juryben

2012 Mar 19, 23:11
0
 

Re: Re: If it's MSVC is the RICH header present in the binary?

Does it contain the path to the Duqu project?

Reply    

Igor Soumenkov

2012 Mar 19, 23:25
0
 

Re: Re: Re: If it's MSVC is the RICH header present in the binary?

No, there are no references to project files in Duqu.

Reply    
If you would like to comment on this article you must first
login


Bookmark and Share
Share

Analysis

Blog