This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite
The poor referer header. Misspelled and misused since its inception.
Its typical use is thus: if I click on a link on a website, the referer header tells the landing page which source page I came from.
Source URL = www.mysite.com/page1 -> Target URL = www.example.com
referer = "www.mysite.com/page1"
It’s heavily used in marketing to analyse where visitors to a website came from, and also very useful for gathering data and statistics about reading habits and web traffic.
However, it presents a potential security risk if too much information is passed on.
In the referer header’s original RFC2616, the specification lays out that: “Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol” That is, if our request goes from https to http, the referer header should not be present.
However, RFCs are not mandatory, and data can be leaked. Facebook fell foul of this a little while ago, when it turned out that in some cases the userid of the originating page was being passed in the referer header to advertisers when a user clicked on an advert.
Additionally, when traffic goes between two HTTPS sites - as is increasingly common in the move towards SSL everywhere - the RFC does NOT require that the referer header is stripped.
ENTER THE META-REFERRER TAG
A potential solution to these two issues, and more, looks to be the meta-referrer tag. By adding the following tag to the source web page:
<meta name="referrer" content="origin">
the referer header can be edited to allow sites to see where their traffic has come from, but without leaking potentially sensitive data.
The options for the content field are:
- no-referrer: omit the referer header from the request
- no-referrer-when-downgrade: omit the referer header when moving from HTTPS to HTTP
- origin: set the referer header to be the origin only, that is, stripping the any path and parameters from the URL
- origin-when-cross-origin: if the request is to a different website or protocol, set the referer header to the origin
- unsafe-url: set the referer header to be the full originating URL regardless of target site or protocol, potentially leaking data.
To use a practical example, if Facebook was to implement this tag as:
<meta name="referrer" content="origin" id="meta_referrer" />
So when Mr Bobby Tables is logged into Facebook, his homepage URL would be: https://www.facebook.com/bobbytables?f=nref
When he clicks on an external link and is taken to a different site, the referer header is reduced to
referer=www.facebook.com
thus preserving his privacy. The target site registers that they’ve had a visitor from a Facebook hit, but the name of the user is not passed on.
Google were the first to implement such a scheme, ostensibly to reduce latency from SSL sites, although one would suspect that being able to prove to clients that your site was the source of their traffic might be closer to the truth.
HANDLE WITH CAUTION
Whether the referer header is implemented with the new meta-referrer tag or not, it is prudent to approach it with a degree of caution.
Referer spam is still an issue - an attacker can target a website using a specific referer header, which is reported by analytics tools to the website owner. Out of curiosity about where their traffic is coming from, the owner will often follow the link back to a malicious web page.
The referer header also opens up potential for exploits and XSS attacks link link. It is trivially easy to manipulate headers, so relying on the header for authorisation or authentication is heavily discouraged.
MISSING HEADERS
The referer header is omitted if:
- the user entered the URL in address bar
- the user visited the site from a bookmark
- the request moved from HTTPS to HTTP
- the request moved from HTTPS to different HTTPS URL
- security software (antivirus, firewall etc) stripped the request
- a proxy stripped the request
- a browser plugin stripped the request
- the site was visited by a program (e.g. using curl) without setting a header
- the meta-referrer tag disallows it
- the meta-referrer tag allows it but the browser does not have meta-referrer support
For websites that would rely on the referer header for certain advertising campaigns, the patchy and inconsistent usage of the header can be a real problem. Proxy rules allowing access for users originating from specific sites both have a high risk of not working at all depending on the user’s browser or local setup, and are also vulnerable to abuse if the headers are manipulated.
TLDR
To sum up, the referer header was rather flakey, and is now slightly less flakey. It’s often omitted either accidentally or deliberately, and easily faked. It can be a very useful tool in gathering data about web traffic, but probably best not to rely on it for anything especially important at this point.
References and further reading
- RFC2616
- Facebook: Protecting Privacy with Referrers
- W3C: Referrer Policy
- Upcoming changes in Google’s HTTP Referrer
- Wikipedia: Referer spam
- Exploiting cross-site scripting in Referer header
- Angler Exploit kit breaks Referer chain using HTTPS to HTTP redirection
- Can I Use: Referrer Policy
- Adam Barth: Referer (sic)
- Mozilla bugtracker: Bug 704320 - Implement
- Stephen Merity: Where did all the HTTP referrers go?
- Moz Blogs: The Meta Referrer Tag: An Advancement for SEO and the Internet
- Mozilla Security Blog: Tighter Control Over Your Referrers
The irony of insecure security software
It can probably be understood from my previous blog post that if it was up to me, I’d avoid products like CrowdStrike - but every now and then I still have to install something like that. It’s not the idea of “security software” per se that I’m against, it’s the actual implementation of many of those products. This post lists up some properties that should be fulfilled for me to happy to install such a product.