A common pattern with client-side MVC applications is to embed the data for a
base set of models in the initial page instead of making a separate AJAX request to
load them. In a Rails application, this is typically done by interpolating the
result of a call to to_json
in the view. The Backbone.js docs
provide this example:
1 2 3 4 5 6 |
|
If you try this in a Rails 3 application, you will discover that by default,
the interpolated results of to_json
are HTML-escaped: &
, >
, <
, and "
are replaced with the equivalent HTML entities. Inside the script tag, this is
almost certainly not what you want. JSON strings containing &
, >
, and <
should contain those characters literally, and the "
character delimits the
JSON strings themselves. Escaping them prevents the desired result:
a literal JavaScript value embedded in the script.
The common reaction is to disable HTML escaping, either by prepending the call
to to_json
with the raw
helper, or calling html_safe
on the result. Here’s
the same example using each of these techniques:
1 2 3 4 5 6 |
|
Do not follow this example! Used in this way, both raw
and html_safe
open
vectors for a cross-site scripting vulnerability, and it is unfortunate that their
use is so
widespread and
commonly recommended.
To understand the vulnerability, consider what happens if one of the strings
in the JSON contains the text </script>
. This text is interpolated
into the page, and since both raw
and html_safe
disable HTML-escaping, it
is interpolated literally. As a consequence, and despite the fact that it appears
within a JavaScript string literal, </script>
closes the script element,
leaving an opportunity to embed an XSS payload in the subsequent text:
1 2 3 4 5 |
|
The simplest way to escape JSON strings that may contain the </
sequence
is to precede the slash with a backslash. Though simple to do, this should be built
in to Rails. Unfortunately, it is not. The obvious candidate would be json_escape
,
aliased as j
, which one would expect to be the JSON analog of the old Rails 2 h
helper
for HTML escaping:
1 2 3 4 5 |
|
However, in addition to escaping the JSON in a way that prevents XSS, json_escape
also removes double quote ("
) characters. Yes, that’s right, json_escape
is documented
to return invalid JSON. This baffling behavior is most likely a mistake in the
original implementation.
I’ve submitted a pull request to change it, which will hopefully be accepted for Rails 4.
A second attempt might be to try escape_javascript
,
but this escapes much more than necessary. It could probably be made to work, but would
require parsing JSON on the client rather than simply interpolating a literal JavaScript
value.
Finally, there’s the option of setting ActiveSupport::JSON::Encoding.escape_html_entities_in_json
to true. This works, but since the default was explicitly changed to false
in Rails 3, it feels like a workaround at best. If you change the default globally, be sure
that any consumers of JSON APIs provided by your application are prepared to handle
Unicode escape sequences, because it will result in </script>
being escaped as
\u003C/script\u003E
rather than <\/script>
.
My recommendation is to overwrite json_escape
with a sensible definition and use
that:
1 2 3 4 5 6 7 8 |
|
1 2 3 4 5 6 |
|
This is simple, sufficient to prevent XSS from bootstrapped JSON, and will hopefully
be the built-in behavior of json_escape
/j
in Rails 4.