Effective TLDs and Why We (Sometimes) Ignore Them

April 19, 2019

Post by Robert Hansen

Once upon a time there were only a few top level domains, like “.com” and “.net” and life was good. Then along came some friendly blokes who wanted everything to live under their top level domain, like “.co.uk” would be the “.com” for the “.uk” TLD. That worked great for about 10 seconds, until a hacker realized that cookies from one “.co.uk” domain were leaking to another.  That is to say if you owned “bank.co.uk” your cookies would leak to “hacker.co.uk” because the browser didn’t know the difference.  They were treated as subdomains of a domain, instead of domains of a TLD.

That was clearly not going to work. So enter bandage #1 – browsers were outfitted with a new technology called an “effective TLD”.  So now “.co.uk” was truly treated like “.com”, in the sense that any subdomains under those domains were treated like domains of a TLD.  Which is to say they were isolated from one another, from the perspective of the same origin policy. That seemed to work great.  That is, until companies got extremely sloppy with their development.

Enter companies like Google. Google owns something called “blogspot.com”. Blogspot is a blogging system and has no relation at all to being a top level domain. However, it was designed sloppily to allow full HTML and JavaScript to be uploaded all under the same domain – with effectively zero checks nor balances (at least initially). So one person on “bank.blogspot.com” could have their cookies read or tampered by “hacker.blogspot.com”. That should have been the end of the story. The story could have simply been: bad developers design bad code, we all laugh at them, they fix it, and the world keeps spinning.

But then came bandage #2. The browser manufacturers decided to allow companies to decide if they were a top level domain. So “blogspot.com” can say “hey, I’m like co.uk”, which they are not.  So now we have a whole new problem – there is no way to distinguish visually from a top level domain, an effective top level domain, and a domain anymore.

This is where it gets problematic. Let’s say someone says “tell me all of my domains” but one of their domains is an effective TLD. Does that mean that it’s not a domain? It sure does – at least from the browser’s perspective. So we have to “sort of” ignore top level domains. But when we crawl sites we obviously can’t ignore the effective TLDs, because we don’t want to accidentally break the same origin policy.  In that case we have to pay attention to it.  It’s a fine line and easily mistaken if you are not very familiar with these nuances.

When people talk with wonder in their eyes about the Internet – I just look at how unbelievably broken these systems are, and how much work it is to rectify these issues for our customers. And that dear reader, is why we sometimes ignore effective TLDs.