We want to scan “ALL” our websites…

Jeremiah Grossman December 21, 2020

Back in my days at WhiteHat Security, countless customer conversations would begin with them saying, “We want to [DAST] scan all of our websites.” DAST refers to Dynamic Application Security Testing. To which we’d instantly reply, “Great! Just give us the list, and some test account credentials, and we’ll get scanning right away!”

Unfortunately, all too often this was the moment where the effort and maturity of their application security program stalled. Even today, very few companies have a complete list of their websites. They may have a list of the “important” websites, most of them anyway, but that’s about it. Don’t believe me? Go ahead and ask. The list will be missing the blogs, brochure-ware, employee portals, marketing announcements, IT management interfaces, customer support systems, foreign language and international version, etc.–and will only include limited information about any of them.

It’s important to appreciate that just because a website isn’t classified as mission-critical, it doesn’t necessarily mean a breach of a secondary or tertiary system wouldn’t have catastrophic consequences because of what it’s connected to internally. The adversary and bug bounty hunters certainly know this. Recall that the breach of Target’s credit card payment network back in 2013 happened by first compromising an Internet-facing HVAC system.

Anyway, WhiteHat engineers would jump in to assist the customer with creating a list of their websites. The process was painfully slow, tedious, laborious, unscalable, unrepeatable, and error prone where data quality diminished quickly. Which is also to say, it was state-of-the-art at the time. Like everything in information security … nothing is easy.

Flash forward to today at Bit Discovery, we’ve developed a fast, simple, and automated way to create a prioritized list of all of a company’s websites by leveraging their complete attack surface map. What follows is a list of steps in the process:

1) Create a Complete Attack Surface Map: Bit Discovery specializes in creating a complete attack surface map for any company, a process I’ve discussed previously. At this point, we’ll assume a full collection of domain names, fully qualified hostnames, IP-addresses and IP-ranges, and associated meta-data for everything across the attack surface. Meta-data includes, but is not limited to, HTTP headers, HTTP response codes, URLs (all 30x and JavaScript redirect URLs) and HTML body.

2) Eliminate the Low-Hanging Fruit:Remove assets with SOA, MX, NS and PTR records. Unless the same fully qualified hostname(s) have other types of DNS entries, such as A or CNAME, they tend to make for poor DAST scanner candidates. Additionally, sometimes DNS entries resolve to private (RFC-1918) IP-address space due to misconfiguration. These assets may be removed as well since they’re not publicly accessible.

3) Port Scans: Most of the time websites listen on ports 80 and 443, but not always. However, it’s also common for websites to listen on 8080 and 8443. Technically speaking, a website may listen on any one of the 65,535 available ports. Strictly speaking, it’s not always necessary to perform a full port scan, but at least check the most common open Web-related ports.

4) Protocol Fingerprinting: It’s easy to overlook that just because something is listening on 80 and 443, it doesn’t guarantee that it’s a website speaking HTTP(s) either. Therefore, after port scanning an Internet-connected device, it helpful to use protocol fingerprinting on the listening port to determine if it’s HTTP(S) or not. As yet another edge case, just because an Internet-connected device listening on both ports 80 and 443, it doesn’t have to mean both point to the exact same website. They could be completely different. I’ll discuss this issue more later. Spoiler alert: Hash values of the HTML content to determine uniqueness doesn’t work well.

5) Broken Website(s): A whole particular service / port may be actively listening, where a TCP/IP 3-way handshake succeeded on Web-port, it doesn’t mean valid HTTP was returned. If HTML or HTTP protocol headers weren’t returned, something may be broken. And believe me when we say it, there is a lot that’s broken on the Web. Minimum viable for DAST scanning an asset is having returned a valid HTTP response code (i.e., 200, 301 302, etc.). Ideally returning some HTML, or at least a redirect URL. If not, remove these assets from the list as they also make for poor DAST scanner candidates.

6) Meta-Data Indicating Importance: When HTML is returned, it should be parsed for telltale signs of importance as a way of establishing priority. Telltale signs could be if the asset is cloud-hosted, contains a CAPTCHA, has a password/login field, contains forms, sets cookies, supports security enhancements such as CSP or HSTS, has URLs with query strings, supports TLS, includes Google Analytics code, displays the company brand or copyright, etc. Each of these signs indicate business value. The more signs of value identified, the more the asset lends itself to being a good DAST scanning candidate.

7) Meta-Data Indicating a Lack of Importance: There are also telltale signs that a website may be a less ideal candidate for commercial or even open-source DAST scanning. The website may redirect to a URL located outside of the company’s core attack surface (i.e., third-party), the fully qualified hostname contains keywords such as “test” or “stag” or “qa” or “admin,” may be running blog software such as Drupal or WordPress, where running a different type of vulnerability scanner is preferable or returns a well-known default install landing page for Apache, Microsoft IIS, nginx, or some other commercial enterprise product (i.e., Cisco, Palo Alto, etc.).

8) Count the Redirects: Single websites very often have dozens, hundreds, or more domain names and hostnames redirecting or DNS resolving to a central website. Vanity hostnames and international TLDs for localization purposes are extremely common to see. It’s unnecessary to DAST scan each individual fully qualified hostname, but it is necessary to scan the eventual final destination of all this attack surface. Generally, the more CNAMEs, HTTP 30x, HTML, and JavaScript redirects that point to a single destination, the more important the asset is likely to be.

9) Filtering & Sorting: With all the above data in hand, it becomes much easier to group all the redirect URL destinations into a single unique website (URL). Then, with an algorithm, generate a value for each asset based upon a weighted combination of all the associated meta-data that’s been collected.

10) Screenshot Review: Bonus. As part of the meta-data extraction process of HTML, it’s incredibly useful to generate screenshots of every website along the way. It’s easy to skim images for any website that should be prioritized up, moved down the list, or grouped with another website. Often it takes just milliseconds physically looking at a screenshot of a website to determine if it’s important or not.

And there you have it. Bit Discovery automated what has been a long-standing and incredibly challenging problem. Plug the final list of websites into your favorite DAST scanner and let it rip! If you need a list of your websites, get in touch with us. We’ll be happy to provide an XLS export for free.

Email: attacksurfacemap@bitdiscovery.com with your name / LinkedIn profile and the company. We’ll handle the rest.