Effective TLDs and Why We (Sometimes) Ignore Them
April 19, 2019
Post by Robert Hansen
Once upon a time there were only a few top level domains, like “.com” and “.net” and life was good. Then along came some friendly blokes who wanted everything to live under their top level domain, like “.co.uk” would be the “.com” for the “.uk” TLD. That worked great for about 10 seconds, until a hacker realized that cookies from one “.co.uk” domain were leaking to another. That is to say if you owned “bank.co.uk” your cookies would leak to “hacker.co.uk” because the browser didn’t know the difference. They were treated as subdomains of a domain, instead of domains of a TLD.
That was clearly not going to work. So enter bandage #1 – browsers were outfitted with a new technology called an “effective TLD”. So now “.co.uk” was truly treated like “.com”, in the sense that any subdomains under those domains were treated like domains of a TLD. Which is to say they were isolated from one another, from the perspective of the same origin policy. That seemed to work great. That is, until companies got extremely sloppy with their development.
But then came bandage #2. The browser manufacturers decided to allow companies to decide if they were a top level domain. So “blogspot.com” can say “hey, I’m like co.uk”, which they are not. So now we have a whole new problem – there is no way to distinguish visually from a top level domain, an effective top level domain, and a domain anymore.
This is where it gets problematic. Let’s say someone says “tell me all of my domains” but one of their domains is an effective TLD. Does that mean that it’s not a domain? It sure does – at least from the browser’s perspective. So we have to “sort of” ignore top level domains. But when we crawl sites we obviously can’t ignore the effective TLDs, because we don’t want to accidentally break the same origin policy. In that case we have to pay attention to it. It’s a fine line and easily mistaken if you are not very familiar with these nuances.
When people talk with wonder in their eyes about the Internet – I just look at how unbelievably broken these systems are, and how much work it is to rectify these issues for our customers. And that dear reader, is why we sometimes ignore effective TLDs.