The Right Way to do Attack Surface Mapping

May 17, 2021

Post by Robert Hansen

This post is the eighth and last of a short series of posts that we have dubbed “Attack Surface Mapping the Wrong Way,” showing the wrong ways that people/companies/vendors attempt to do attack surface mapping. In this final post, I will show the right way.

The answer: start with everything

So now that we have enumerated many enormously costly or broken ways to perform attack surface mapping (which unfortunately are used in a wide variety of different commercial products, incidentally), let us talk about what does work. What works is starting with everything.

When we talk about everything, it is sort of sounds like a joke, but it is not.  To get the maximum resolution of information, we need to start with every IP address, every hostname, every port, every website URL, every whois record, every ASN name, etc.  Everything!

Talk about an enormous undertaking – and that is why it is rarely done. It is difficult and expensive to collect the data, it is difficult to parse the data, it is difficult to correlate the data, and it is difficult to present it in a useful way.  Many companies take shortcuts, and they attempt to do this themselves, but the part they shortcut on is the “getting everything” part.  They get what they think they own and attempt to pivot on it. That is no different than Fierce’s techniques that are antiquated and proven to be inferior many times.

Once the data is in the system and parsed, it needs to be correlated. That means that metadata must be logged and compared across the entirety of the Internet. That is both costly and time-consuming to do on a one-off basis, but you can make the resultant data extremely quickly queried if you do it for every asset on the Internet. Further, if you provide good tools on top of the data, it becomes very easy to add and remove data that is either false-negative (missed a company your just purchased yesterday) or false-positive (you sold this company yesterday).

The two advantages of this setup are time and accuracy.  The amount of time it takes to query a system that is constantly performing this type of analysis across the entire Internet is usually seconds or minutes. In comparison, other systems can be weeks or months – systems that run ad-hoc tests tend to be extremely slow by comparison.

As far as accuracy, you get wildly better data if you correlate it all ahead of time. This is because you get to correlate all of it, as opposed to only the tiny slice of seed data you might start with in these other asset inventory architectural designs.  By knowing everything, it becomes much easier to whittle it down to the things you do own.

There are many ways one can find shadow IT. Just look under someone’s desk, and you might find it that way. I have often seen security experts doing large-scale network analysis to identify MAC addresses of machines that should not be there. They might also be using the network security protocol 802.1x, which uses certificates, to identify which asset cannot connect due to a lack of certificates.

Although, if you want to find assets at scale and that cross the boundary of your controlled LAN and the wild west of what is on the public Internet, an Easter egg hunt under people’s desks just is not the way. Nor is searching in places you usually look. These are the “unknown unknowns” that people talk about – if you’re searching where you are always searching, you’ll find the same insights, and likewise, you’ll miss the same assets.  Worse yet, this is increasing in danger with time because more and more assets are investing outside of the corporate LAN in newer, shinier, cloud-based SaaS.

If you start with nothing, it isn’t easy to build up a comprehensive list. You can do this if you already know where to search or are extremely good at hunting, but it is not easy, and it is very error-prone because it is mainly manual. It is also an enormous time sink, and often you will miss a lot because of the human factor and due to technical limitations of pivoting. Issues like marketing teams with their own budgets, dev teams, and infrastructure are prime reasons why pivoting is of limited utility.

So, the better alternative to find shadow IT is to build up an asset list of everything everywhere. Then use the asset list to narrow in on things that might be correlated by applying metadata to the asset and comparing that metadata. The simplest example would be looking for two domains with the same name but different top-level domains (for instance, “example.com” and “example.net”). That is only possible if you know every domain everywhere. Metadata is not uniform, though; some machines will have ports, and others will not. Some machines will have websites, and some will not. You get the drift.

But when two or more things begin to look the same, it becomes possible to link them together. However, you cannot build up a list of assets from scratch. You must start with everything and narrow it down from that comprehensive list to find things that correlate. That is part of why asset management is so challenging if you want to do it well – you cannot reliably do it yourself unless you know every piece of metadata for every asset everywhere and can build correlations based on that knowledge.

That is why having an up-to-date asset inventory based primarily on DNS and secondarily on IP/ASN/brand/etc., which can be queried in real-time works so well – you cannot find shadow IT by doing real-time analysis unless there are already other linkages that point you to that domain. You need to find and use any form of metadata possible to build up a massive data lake to build those correlations.

This an expensive proposition. It is also the main reason traditional open-source tools which do OSINT (open-source intelligence) application/domain discovery are rarely as thorough as a more comprehensive and costly method when discovery is performed on large enterprises. Small companies may have better luck, sure, but small companies can probably inventory their assets thoroughly in 100 ways. When the company starts growing, or when it is not your company but a vendor/partner/customer — you realize those tools simply are not the right answer if being thorough is important.

If you ask Equifax or Sands or Target, or a host of other large companies, you will start to see the writing on the wall. The only right way to do attack surface mapping is a strategy that intelligently empowers many of these strategies in a way that mirrors how the Internet really works – not how the vendors want it to work. A holistic asset map cannot be cobbled together on a shoestring, which is why it takes a company like Bit Discovery to drive down the cost on a per-customer basis.  It is also why we have strived so hard to make all this technology invisible and seamless to our users and fully automated and user-controlled. This all comes down to reducing hidden costs and improving time to value.