IP is the Wrong Way to do Attack Surface Mapping

Robert Hansen May 5, 2021

10 min read

This post is the sixth of a short series of posts that we have dubbed “Attack Surface Mapping the Wrong Way,” showing the wrong way that people/companies/vendors attempt to do attack surface mapping. Next up is IP and why it is the wrong way.

IP Only is Flawed

Many security tools scan and produce data based on IP addresses. This is a pretty onerous way to start because it relies on companies knowing the IPs of all of their assets – even the ones that typically they don’t know about. It is a chicken and egg problem. However, many tools/vendors pass that hidden cost onto the companies to figure out. Let’s say that is doable, for the sake of argument, and fast and up to date – all of which are rarely true.

The simplest way to gather HTTP data on a wide swath of IP addresses is to connect to the sites and issue a GET request. Something like this (this is not a real example, but so that you can follow along):

$ telnet 54.183.35.254 80
Trying 54.183.35.254…
Connected to ec2-54-183-35-254.us-west-1.compute.amazonaws.com.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 302 Found
Content-Type: text/plain; charset=utf-8
Date: Fri, 08 Jan 2021 23:32:00 GMT
Location: https://10.10.10.10/
Server: Apache
Vary: Accept
Content-Length: 42
Connection: Close
Found. Redirecting to https://10.10.10.10/

The most important things to note above are in bold. As you can see, when you connect to 54.183.35.254 on port 80 and send the command “GET / HTTP/1.0” with two newlines to the server, it responds with a response. That response includes a redirection directive to https://10.10.10.10/, which is not even an IP address that has an open port on the default HTTPS port of 443 in my case because it is an RFC1918 internal address space (also a minor information leak, incidentally when you see this). It may be open for you, but that would be because your internal network has an open port on that internal address space – not because it is relevant to the issue at hand. This is clearly not correct, and that is obvious if you know that the site is actually bitdiscovery.com’s webpage.

Let us try it again, but this time sending the host header of “bitdiscovery.com”:

$ telnet 54.183.35.254 80
 Trying 54.183.35.254…
 Connected to ec2-54-183-35-254.us-west-1.compute.amazonaws.com.
 Escape character is '^]'.
 GET / HTTP/1.0
 Host: bitdiscovery.com

 HTTP/1.1 302 Found
 Content-Type: text/plain; charset=utf-8
 Date: Fri, 08 Jan 2021 23:35:45 GMT
 Location: https://bitdiscovery.com/
 Server: Apache
 Vary: Accept
 Content-Length: 47
 Connection: Close

 Found. Redirecting to https://bitdiscovery.com/

The most important things to note above are in bold. Now that we sent the correct host header, it is redirecting to the correct hostname. If our scanner is designed to follow redirects, it will then see the actual application in question – mission accomplished, but only if a Host header is sent. That may seem minor, but this makes an enormous difference in how web applications work and whether an attack surface map will be somewhat realistic.

When a scanner connects to an IP address, it does not know anything about it other than what you tell it. So, if you tell it to connect to an IP address and do not send a Host header, you are telling it not to ask the system to elicit potentially useful application logic.

Content delivery network (CDN) and cloud-based web application firewall (WAF) providers use a minimal amount of IP space and leverage tricks with Host headers and SSL/TLS certificates to host enormous amounts of web applications from a disproportionately small amount of IP space. This is likely done for administrative ease and to reduce the cost associated with buying up large swaths of IP space.

However, even on a small level, many tiny organizations and completely unknown applications leverage VirtualHosts to host two or more web applications on the same IP address. That means there is an enormous amount of application logic that simply is not being seen, let alone exercised by the average IP scanner.

As a result, an enormous amount of attack surface area is hidden from an IP-based (versus DNS-based) scanner. That can include dangerous application logic, insecure cookies, links to old social profiles, out-of-date JavaScript libraries, and on and on. The danger is enormous because a massive chunk of the web is moving to this type of architecture. The more application logic and web surface you miss, the more likely an attacker can leverage that.

Therefore, an attack surface map needs to start with DNS and utilize that to make the potentially numerous requests to a singular IP address to gather the detailed information necessary. Further, it is essential to make the requests to all the IP addresses that DNS may point the scanner to in the case of round-robin DNS. That way, the scanner can identify each asset’s application logic, which may differ slightly or enormously depending on which machine it resides on.

So, using IP alone should never be the primary path forward to identify what type of application logic and services are running on the IP addresses in question. That puts another nail in the coffin for NetFlow, as we discussed in the last post. However, it also admonishes all tools/vendors who demand that their customers know their IP space and only leverage that as the entry point into the applications.

I do not mind that old-school network people still want to think in terms of “our IP space,” but that should primarily be used as a supplement to find shadow IT that may pop up in that environment that does not have DNS associated with it. To be clear, I do recommend that people upload their IP space if they happen to know it. This is to get the most and best coverage, but only after they upload their domains because of non-contiguous IP space, which is heavily used in modern environments due to cloud-based SAAS software. So, go ahead and utilize IP, but if you or your vendor rely on IP alone to do analysis, you are likely missing many critical assets.