XSS Worm Analysis And Defense

Date: 01/10/2008

Abstract: This paper is the product of a week long contest to write the most diminutive self replicating XSS worm.
This was a controversial contest, where there were no prizes awarded,
and the goal was to build the worm that would fit in the fewest bytes
possible. In the end there were two people, Giorgio Maone and
Sirdarckcat who tied the contest at a stunningly small 161 byte cross browser compatable XSS worm. However, the journey was as interesting as the result.

To date there has been little to no research done in the methods of

propagation and optimization. The lack of research is in part due to
the relative infrequency of these worms in the wild, as well as
scarcity of real-world worms.
Each example found had three problems with them. 1) They contained site
specific code 2) they contained obfuscation for filter evasion and 3)
they contained a payload. Also, cross browser compatibility was not
always present, making it harder to diagnose exactly what propagation
code looked like. Rather than attempt to triage code that was not
designed with these issues in mind, a contest was built to gather
sample code that could be used for more in depth analysis.

Browser companies have not yet, to date, constructed a way to

display user submitted content in a way that protects the website and
users from malicious behavior, but still allows feature rich user
submitted content. People sometimes say that certain social networking
sites don't write secure code. That's not always the entire truth.
Often the code is highly secure and could be made more secure with the
flip of a switch, it's only that the business landscape requires the
code to do insecure things for economic and user satisfaction reasons.
Those things combined require that we search for alternatives to the
problem, and give ourselves opportunities to build more secure websites
based on our findings, having seen the output of the diminutive proof
of concept worm code.

Assumptions: The theoretical social networking site we will

be discussing in question is vulnerable, not because they are unaware
that they are vulnerable, but because consumers demand rich content. It
is assumed that companies want to do the right thing, in terms of
security, but often cannot as a result of the business rules by which
they must obey. It is also assumed that the attacker has prior
knowledge of the domain. The code that they built must fit within a
certain length field (like a name field, or a title) and will be
rejected if it exceeds that length (there are reasons this limitation
was put in place that will be discussed later in the paper).

Worm Problems: There are some major issues when building a

self propagating worm which were intentionally avoided using contest
rules. One of the most important and difficult rules to abide by was
the issue of growth. One of the rules stated that the worm could not
grow in length once it was submitted. There are three reasons this rule
was critical to adhere to.

The first reason is probably the simplest and also least likely to
be a real issue. HTML input lengths, code based data size limitations

and most importantly database field lengths all may introduce hard size
limitations of only the size of the worm. The second semi unlikely
reason that this would be a factor is the worm author may want to limit
the negative impact of the worm until it had reached maximum
propagation by reducing the utilized database space increase, and
bandwidth required for propagation. The latter also has the benefit of
increasing the rate of propagation.

The last and single most important reason to limit growth was this example by Ronald:

While Ronald's example could have actually won the contest, sans the
growth rule, it had the flaw of linear growth. Let's use an illustrated
example using smaller byte count for demonstration purposes only.
First, let's assume there is a hard limit of a database field of 50
characters, however, instead of rejecting the content it simply chopped
off anything greater than 50. A sample page may look like so:

Site content here - 17 chars
attacker's vector here - 23 chars

More site content here - 23 chars

So after the first iteration of worm propagation, the site would
chop off anything greater than 50 chars, which has the effect of
chopping off the last part of the page. That's not an immediate
problem, because the vector is still there. Let's look at the second
page on which the worm was now posted to:

Site content here - 17 chars
Site content here - 17 chars (original variant's header)
attacker's vector here - 23 chars

More site - 10 chars (chopped off from the first iteration of the page)

More site content here - 23 chars

In the next iteration the last 7 bytes of the worm code would be
chopped off as the site content continues to grow, which will break the
attacker's worm. It would now look like this:

Site content here - 17 chars
Site content here - 17 chars (second variant's header)
Site content here - 17 chars (original variant's header)
attacker's vect - 16 chars (broken vector)

More site content here - 23 chars

Even if the vector worked without the missing seven bytes, on the
next iteration of worm propagation it wouldn't because it would cut off
the next 17 bytes, leaving only headers of pages from previous
propagation. So in this way, a hard limit of n bytes with linear growth
before the worm begins will eventually cause the worm to discontinue
propagation, unless there is no other content on the page prior to the

Note: The word growth is not really correct, as in reality it

will continue to stay a static size (the maximum that the script
allows) so although the information before the worm is linear growth
until code breakage, the actual content submitted does not grow beyond
the limits of the website.

Ultimately, though, the reason to limit length though was to reduce

the three things mentioned earlier - obfuscation, site specific code
and payloads. This leads us to the next issue. As mentioned before
there is a theory that to make a worm small it will inherently become
obfuscated due to the coding tricks necessary to reduce the size. While
this is mostly a red herring, because this is not the same form of
obfuscation referred to (rather filter evasion obfuscation) there were
examples of this that caused one of the other issues that we had
attempted to avoid (the site specific coding issue).

One of the rules of the contest was that the submissions must POST

to "post.php". The goal here was to give them a requirement of posting
to another page than the page they were on. Early on, oxotnick asked a good question;
which is should we assume that the page you are submitting to is in the
same directory or relative to the base directory. For the purposes of
the contest, people were asked to assume they were in the same
directory, however, this and the contest naming conventions used within
the rules caused an interesting site specific coding optimization:

While the code is entirely valid for the rules, in reality, it's not
portable. If the name had been anything but a string beginning with the
word "post", like the word "test" this code would not have functioned.
So while this code is interesting from an optimization perspective, it
needs to be ignored for analysis.

Worm Best Practices: At one point Ronald pointed out

that images may be a better universal vector for XSS worms, than things
like iframes, or scripts. This may be a true statement, given that many
sites do allow images by default. Without any real numbers to back this
up, it's speculative, but possible. At the very least, the visual
fingerprint is less without sizing. Ultimately, in another thread DoctorDan proposed an interesting question regarding wheter XMLHttpRequest is a better propagation method than the other prevalent method, which was submission of a form.

The benefits of XMLHttpRequest are many. Firstly, it's much more

silent, because it doesn't actually force the user's browser to
visually change to another page. It's not just visually silent though,
as the auto-submit method also can make a clicking sound if the user's
browser is set up to do this (which is often default). Also bwb labs had a great point
regarding a looping effect of the submission method. Let's take a
specific example of a site that upon submission automatically shows you
the content you just submitted.

In doing so it would show the victim the payload and would

automatically post the content back once again - putting the user's
browser into an infinite loop. While this type of setup isn't
universal, it was worthy of note, and could easily lead people to be
more interested in using the XMLHttpRequest method for propagation. For
the purposes of the contest, submission based worms were not forbidden,
as there are many sites that don't have this setup, and even if it may
spiral a user's browser out of control with submissions, that may be
the attacker's intent, or it may be inconsequential to the attacker.

Worm Defense: Two interesting problems stopped a number of

variants, which resulted in a new rule during the contest. While they
are not considered good worm defense, they did stop a number of worm
variants, requiring further work. The first is the declaration of onfocus and onload event handlers in the body tag. Also early on ritz found a problem with his code on pages that had a DOCTYPE assigned.
So it would appear, using these would stop a certain amount of worm
variants. Clearly both of these issues were eventually worked around,
after the new rule requiring the worm code to work despite them. Either
way these issues were worthy of mention.

Let's take a step back for a moment. The above comment, "Browser

companies have not yet constructed a way to display content in a way
that protects from malicious behavior, but still allows feature rich
user submitted content" is an over statement. In this case, the browser
companies have provided a single useful tool. In fact it is a fairly
powerful tool - the iframe. This is not in reference to the on-page
iframing that was proposed with content restrictions. This is in reference to the normal off domain iframe.

One of the reasons people don't use iframe is because they are

concerned about the search engine value of the page they are
constructing. If a company dices up their page into iframes, they will
lose search engine value. The major search engines of the world haven't
figured out a way to keep SEO (search engine optimization) value the
same when you split your page into two different pieces (the protected
content on your domain, and the potentially dangerous user submitted
content on another domain). The one exception to that rule is using
cloaking - where you display both the site content and the user content
on the same page, only when the search engine spiders your site.

Note: Google has been hypocritical about the corporate opinion on

cloaking, telling small companies it is not okay, and telling
enterprises it is. So Google has created an unfair market place and as
such use of cloaking is potentially dangerous to your business, given
Google's pre-disposition to blacklisting based on that. Cloaking is
deemed blackhat SEO based on rules that are still, as of yet not
communicated publically; at least not in their entirety. The use of
cloaking, even to protect consumers may get a website banned, so its
use is not recommended unless there is an agreement made with Google
prior to implementing this technique.

To get real value out of this worm analysis we shouldn't ignore our

history lesson. The first and biggest XSS worm in history, was the MySpace Samy worm. One of the site specific things that Samy wrote in his worm was the following:

The reason this is important is because Samy was trying to overcome
a basic principle in browser security - the same origin policy. Samy
knew that the bulk of his code, which used XMLHttpRequest wouldn't work
unless he switched domains to the one that allowed his worm to
function. This leads us to the next part of defenses. One thing that
has been mentioned a lot in defeating cross site request forgeries
(CSRF) is the use of a nonce, or a one time token. Nonces can be read
if they are on the same domain, by XMLHttpRequest. So it would stand to
reason that it is better to omit a nonce on another domain that sites
know are completely free of XSS vulnerabilities. That is because
XMLHttpRequest must obey the same origin policy.

Note: Please note that Mozilla has discussed a cross domain version of XMLHttpRequest.

It is unclear if this would open this up further to attack or not, but
clearly any technology that allows for cross domain reading whether
intentionally or not would break not only this technique but many other
security protections built into websites.

So it would make sense to omit the nonce in a button on another

domain since that domain is not readable by the JavaScript worm. This
lends itself to a different sort of attack, like the one against Google Desktop
where an attacker floats a small or even invisible iframe beneath the
user's mouse. So the button can still be pressed. Although this does
require some user interaction mouse clicks are so commonplace, it
wouldn't be any surprise if this were used in a real worm.

However, if there is an anti-framing script that detects if the post

page is being framed and then un-frames itself before the user has an
opportunity to be subverted, that could easily protect the page from
being clicked on. There is, however, a snag. In Internet Explorer
iframes can be tagged with security=restricted. This turns off
JavaScript on the iframe. That would stop the script that did frame
detection from running.

Note: Please note that Firefox does not have an equivalent to

security=restricted in off domain iframes, so this technique will work
as described for Firefox users.

Although it appears all hope may be lost, there is an additional

possibility. If the button is not a static button, but rather a button
that itself is at least partially rendered using JavaScript, an
attacker's use of security=restricted will not only cause the page
frame detection script to fail to load, it will also cause the button
to not include the nonce. This is problematic for users who don't have
JavaScript installed though, so an alternative must be provided. For
those users, the button instead may point to a login page, where the
user is requested to log in to post their user content. This JavaScript
alternative is only a minor inconvenience and effects approximately
0.1% of user population (based on the number of users who surf without
JavaScript enabled).

Note: It is important to address the non JavaScript version of

this technique due to concerns over lawsuits initiated by the National
Federation for the Blind on behalf of the Americans with Disabilities

This technique, combined with some sort of state management between

the two domains should help thwart XSS worms. It must by restated that
this paper is authored with the assumption that dynamic user content
must be allowed for the site's business operations to continue, but
that all other site code is designed to be as secure as possible.

Note: Please note that this technique does not protect against

any browser bugs that allow a browser to break the same origin policy
(Eg: the now-defunct mhtml bug or DNS rebinding).

Items Not Covered: There are several items missing from this

paper. They include things like filter evasion, payload and command and
control. Both filter evasion and payload have been covered by countless
posts over the last several years in various forms and forums. Command
and control is still widely debated, and no concrete observations can
be discussed at this point beyond some of the requirements for a solid
command and control structure for polymorphic XSS worms and for worms
that have delayed payloads, based on a certain density of infection.

Additionally, while there has been some talk about tracking worm activity and at least one project attempting to help mitigation with the assumption that any potentially dangerous content is disallowed,

this paper did not discuss either of these issues as they are far more
in depth and will almost certainly require more thought and research by
a larger audience.

Summary: While XSS worms are hardly a solved issue, some of

the findings of the diminutive XSS worm replication contest definitely
help construct the solutions outlined in this paper. This is anything
but a definitive list of all ways to thwart XSS worms, but it should be
a good primer on some of the findings that came from the XSS worm
contest, and should help. Without browser modifications, this appears
to be the best software agnostic solution to worm propagation, however,
no doubt revisions on this technique and others will yield better

Thanks: Special thanks to everyone who helped contribute to

our understanding of worm propagation over the last week (in
chronological order): .mario, thornmaker, digi7al64, Gareth Heyes, Matt
Presson, sirdarckcat, ritz, Alex, barbarianbob, BlahBlah, arantius, bwb
labs, ma1, Spyware, Reiners, Spikeman, Ronald, Torstein, dev80, amado,
shawn, hallvors, DoctorDan, oxotnick, dbloom, Kyran, tx, 4909, beNi,
backStorm, badsamaritan and anyone else I may have missed. Without
these people and their talent, this research would never have been
possible. Also thanks to thrill and id for providing edits and feedback
on this paper.

[Source: ha.ckers]