13 Jul Anti-virus testing - to believe or not to believe Sergey Novikov
07 Jul Testing and Accountability Roel
17 Jun Fully tested Roel
01 Feb On the way to better testing Magnus
17 Oct Secunia tests Aleks
05 Feb Founding of AMTSO Roel
Join our blog
You can contribute to our blog if you have +100 points. Comment on articles and blogposts, and other users will rate your comments. You receive points for positive ratings.
Join Roel Schouwenberg and myself as we explore what AV tests are about today and reflect on what is important for people using these tests to make an informed decision about buying protection for themselves and their families.
Roel describes what he believes a useful test is and also discusses AMTSO - the independent Anti-Malware Testing Standards Organization. AMTSO has created a series of documents describing testing processes; the results can be seen already in how some of the more reputable testers are changing their methodologies.
AV testing is important for everyone who is looking to purchase an AV for themselves or for their organization. Take a few minutes and learn more about it with us.
AMTSO (the Anti-Malware Testing Standards Organization) is a coalition of security professionals, including many antivirus product vendors, product testing organizations and publishers, and some interested individuals. Given the highly technical nature of its activities, it is inevitable that the organization owes some of its authority to the expertise of the security specialists within its ranks, but that doesn’t make it a vendor lobby group. As Kurt Wismer (not himself a member) points out here “many of them are employed by vendors precisely because that's one of the primary places where one with expertise in this field would find employment.” Given some recent negative publicity aimed at AMTSO ( example), we want to collectively clarify the following points on behalf the anti-malware industry, where we come from, and indirectly on behalf of AMTSO.
We find it strange that expertise in the testing field is somehow seen as a disqualification, given the specialist expertise that characterizes the group.
While some distrust anything a vendor says and accept uncritically anything a tester says, others are puzzled that different tests can vary so dramatically in their evaluation of the same product. While this may sometimes be simply due to poor testing practice, there are other, deep-seated reasons, one being the high volume of malware and new attacks seen every day. Vendors work hard to close the gap between the ideal 100% detection and what is actually achievable, by developing a range of technologies, both proactive and reactive. The capabilities of products can change, while tests using broadly similar methodology can generate dramatically ‘conflicting’ results due to different approaches to the selection, classification and validation of samples and URLs, among other factors.
AMTSO aims to promote precisely the kinds of tests that clearly show up these variations, and its members were flying the flag for real world testing before AMTSO ever formally existed, believing that sound testing benefits vendors and customers as well as testers. As an industry, we are all too aware that we cannot currently offer detection of all known and unknown malware. The relatively high scores achieved in established tests by major vendors do not necessarily reflect real world performance, but real-world detection cannot be measured in terms of product comparison with no checks on selection, classification and validation of malicious samples and URLs.
Another misconception is that AMTSO members simply don’t like tests done by non AMTSO members. This is not the case: none of the undersigned have a problem with labs that intend to provide objective, real-world testing. (Though other testers are entitled to object vehemently when one company claims to be the only one doing live, internet-connected testing, and that all other testers are doing static testing based on the WildList).
However, charging consultancy fees for the release of any information relating to a test (even to participants) is very different to the transparency that AMTSO advocates, though we recognize that full-time testers generate revenue like any other business. However, when a tester claims to have shared information about methodology in advance, and fails to provide methodological and sample data subsequently, even to vendors prepared to pay the escalating consultancy fees required for such information, this suggests that the tester is not prepared to expose its methodology to informed scrutiny and validation, and that compromises its aspirations to be taken seriously as a testing organization in the same league as the mainstream testing organizations committed to working with AMTSO.
No-one believes that AMTSO has all the answers and can “fix” testing all by itself, but it has compiled and generated resources that have made good testing practice far more practicable and understandable. The way for testers (and others) to improve those resources is by talking to and working with AMTSO in a spirit of co-operation: the need for transparency is not going to go away.
As you may have read the AMTSO had another meeting a couple of weeks ago. AMTSO is strongly committed to improving the overall relevance of anti-malware testing.
During our latest meeting we accepted two new papers. The first paper is on whole product testing and the second is on performance testing. As pointed out by the vast majority of people in the AV space the tests of old have never truly accurately reflected real life performance. With the changes that the threat landscape has seen over the past few years this has become truer than ever.
So rather than having tests which focus on individual components to test detection of, or rather protection against, threats an entire product should be tested. Just think about a scenario where an email-borne threat is not detected by the file scanner, but the anti-spam component is able to flag the message it comes with as spam.
The other document talks about how to more accurately test the performance – or speed (impact) – of AV solutions. One scenario where this will be useful could be determining the amount of RAM a certain product may occupy. Many people will try to establish this by looking at the amount of (virtual) memory taken by the processes belonging to the product. However, certain products may also inject some of their DLLs into other processes, therefore unintentionally masking some of their footprint. It’s therefore the best practice to compare the entire RAM usage.
The bad news, I say jokingly, comes from one of the new documents we continued working on in Helsinki. The False Positive testing document has proven to be quite the challenge and sparked a lot of debate. Especially the area of testing false positives on web resources - such as domains and web scripts - an interest of mine, proved to be particularly challenging.
It definitely looks like testers are continuing to improve their tests to more accurately reflect real life scenarios. And that’s great for two main reasons. Most importantly, it gives users better information. Secondly, it gives vendors the opportunity to spend their resources focusing on things that protect the user. So it’s great to see the progress that we’re making in AMTSO.
If you haven't had a chance to read the documents go to the AMTSO web site and have a look!
Have you ever found a false positive when uploading a file to a website like VirusTotal? Sometimes it happens that not just one scanner detects the file, but several. This leads to an absurd situation where every product which doesn't detect this file automatically looks bad to users who don't understand that it's just false positives.
Sadly you will find the same situation in a lot of AV tests, especially in static on-demand-tests where sometimes hundreds of thousands of samples are scanned. Naturally validating such a huge number of samples requires a lot of resources. That's why most testers can only verify a subset of the files they use. What about the rest? The only way for them to classify the rest of their files is using a combination of source reputation and multi-scanning. This means that, like in the VirusTotal example above, every company that doesn't detect samples that are detected by other companies will look bad - even if the samples might be either corrupted or absolutely clean.
Since good test results are a key factor for AV companies, this has led to the rise of multi-scanner based detection. Naturally AV vendors, including us, have been scanning suspicious files with each others’ scanners for years now. Obviously knowing what verdicts are produced by other AV vendors is useful. For instance, if 10 AV vendors detect a suspicious file as being a Trojan downloader, this helps you know where to start. But this is certainly different to what we're seeing now: driven by the need for good test results, the use of multi-scanner based detection has increased a lot over the last few years. Of course no one really likes this situation - in the end our task is to protect our users, not to hack test methodologies.
This is why a German computer magazine conducted an experiment, and the results of this experiment were presented at a security conference last October: they created a clean file, asked us to add a false detection for it and finally uploaded it to VirusTotal. Some months later this file was detected by more than 20 scanners on VirusTotal. After the presentation, representatives from several AV vendors at the event agreed that a solution should be found. However, multi-scanner based detection is just the symptom - the root of the problem is the test methodology itself.
By now most people have seen the Secunia test results and all the ensuing discussions. Frankly, I was a bit surprised by the vehemently negative reaction from a number of AV vendors.
And it doesn't seem to be about the 20% difference between the 'winner' and the rest. Criticism has focused on the testing methodology, which many people thought was dubious. Some of the suggestions were useful - mostly those from Andreas Marx, the well-known AV solutions tester from Germany. The general tone, though, seems to be that many AV vendors thought their results would have been a lot better if the test methodology had been different. And maybe they're right.
But I think people are too focused on looking for mistakes in the tests and/or attempting to explain their poor PoC detection rates. Sure, criticizing Secunia's testing methods is justified, but only if we're discussing testing methodology, and nothing else.
As I see it, Secunia wasn't trying to highlight the weaknesses of AV solutions - I think they were trying to make a different point...
At Kaspersky, we've taken a decision not to detect PoC vulnerabilities - it's far more sensible to focus on protecting users from the real threats and exploits that are being used by malware authors in the real world. That's what our antivirus databases are for. The point isn't so much that detecting PoCs is a pretty difficult task (although the test results clearly show that even Microsoft and Symantec, with all of their resources, didn't fare all that well) but that detecting PoC s is a dead end, and doesn't address the fundamental problem.
So what is the problem?
In case you missed it: recently more than 40 anti-malware researchers and testers got together in Bilbao, Spain, to formalize the charter of the Anti-Malware Testing Standards Organisation (AMTSO). The organization's main aim is to create security software testing guidelines and standards.
Why is a body like this needed? Well, although security software has changed enormously in the last ten years, most tests used today haven't evolved at the same rate. New and better tests are needed to better assess the effectiveness of new technologies. AMTSO is a very significant move towards having tests that more accurately reflect the performance of security software in real life situations.
I was part of the initial talks about this way back during the AV Testing Workshop, and it's clear that with this new organization, we've come a long way.
Right now the group consists of AV researchers and testers. One of the goals is to include academics as well. AMTSO strives to be vendor and technology neutral and academic members will be very helpful in ensuring this position.
It'll be interesting to see what AMTSO comes up with it. As a member of the pro tem standards and guidelines subcommittee I'll obviously have a say in the matter. The result may be that we end up with tests where security solutions don't score as highly as they do in current tests. But this will be no bad thing if test results reflect the genuine ability of solutions to combat today's constantly changing threats.
Read more about the organization here
A few days ago David wrote about ConsumerReports, which created around 5,500 new virus variants in order to test antivirus solutions. Like most antivirus companies, we weren't particularly impressed by this.
Recently a writer for heise.de, probably the best known German IT website, picked up on the topic, criticizing the reaction of antivirus companies: “[they] fail to notice that they sound like Mercedes dealers complaining about the 'elk test' – arguing that there are enough real accidents to analyze the safety measures of their cars.”
This comparison is specious: in the context of antivirus testing, the 'real accident' is a computer or network infected by in the wild malware, and the 'elk test' is controlled testing under laboratory conditions. We've got nothing against controlled testing, as long as it uses malware which exists in the same form in the wild. We're also in favour of testing solutions which have deliberately not been updated - old signatures mean that heuristics and proactive protection technologies can be fully tested.
I can’t see any benefit in using newly created variants of existing malware in tests. And the argument that these new creations won't be made publicly available is irrelevant here. At the end of the day, such tests could lead to an atmosphere of open competition, with the testers attempting to trick as many antivirus solutions as possible by using more new and different malware. Of course, this would all be in the name of security... but it could decrease the amount of effort virus writers have to put in, with the burden ultimately being borne by end users.
An organization called ConsumerReports published an article today that suggests it 'created 5,500 new virus variants derived from six categories of known viruses, the kind you'd most likely encounter in real life.'
This is a really unwise thing to do. There are plenty of 'real' viruses, worms and Trojans around without well-meaning organizations generating more of them, for whatever reason.
The premise on which ConsumerReports seems to have based its actions on is this: "We hadn't seen any independent evaluation of antivirus software that measured how well products battle both known and new viruses, so we set out to fill that gap.” In fact, AV-comparatives publishes tests evaluating products' ability to find both known and unknown threats ... and they do this without having to create new viruses. There are also a number of other independent organizations that test the detection capabilities of antivirus products, including AV-Test GmbH, Virus Bulletin, ICSA Labs and West Coast Labs.
And they all make their results public; something that ConsumerReports seems not to have done so far.
The other day I was presented with the results of a test conducted by a local office from another antivirus company.
Basically the test comes down to end-users being asked to uninstall their current antivirus software and install the other antivirus product.
After this is done a full system scan takes place and the number of detected malware along with the name of the previously installed product is collected to gather statistics.
The competing antivirus programs are ranked in order of the average number of malware they didn't detect compared to the program the testresults belong to.
Anyone who has a basic understanding of computer security can see why this test is completely flawed and totally useless.
There is no verifiable set of malware samples, meaning that the other product may have identified legitimate files as being malicious.
But, even more importantly, there's no way of telling what state the previously installed antivirus software was in.
Given the way how the test was performed it's likely that most products were either outdated or pirated versions which can no longer be updated. And many more reasons can be thought of why this test is completely flawed.
In short it just comes down to that there are no real controllable variables, completely the opposite of that which makes a good test good.
The antivirus industry is a sensitive one, we must always take great care with what we say and what we do.
This also means that every antivirus company is responsible for the image and reputation of the industry as a whole.
Especially in case of tests this is where the antivirus experts come in. They are the people who have the skills to see which test is good and which is not and advise the marketing department accordingly.
After all, we must prevent misinformation every way we can, even if the misinformation might provide a positive outcome in the end.