April 9, 2019

Bit Discovery Design Considerations

Post by Robert Hansen

When I started building the back-end infrastructure for Bit Discovery’s external asset inventory system – I learned a lot about the Internet that I only had a vague notion of before then. One of the most important things I learned is that Internet is… weird. Really, really weird. There are huge error rates in DNS, and there is little to no double checking of records so you often get totally non-compliant responses. RIRs require faxing or snail mail to get authorization. ICANN has no way to maintain continual access to records. It’s a cluster – and this is the basis for everything the Internet relies on.

However this brings us to an interesting realization – that the Internet sort of works anyway. For all of the duct tape and bailing wire we see in this industry it still manages to function, which boggles the mind. When I began constructing Bit Discovery’s back-end I had to come up with some design principles. The first of which is that I can’t make assumptions about what I’m seeing, I just have to be a reporter and display what I get. No matter how wrong it seems to be, as long as it is RFC compliant. So yes, public DNS records with RCF1918? Those can’t be altered no matter how stupid they seem.

As I started designing and writing code I came up with a number of design principles that went along those lines:

Don’t make assumptions about the data, just assume it’s correct. That doesn’t mean you can’t call out things that look incorrect/dangerous, but don’t assume I know more than whomever wrote it. It turns out RFC1918 is a good example – where it can be used to know about internal networks.

Don’t throw away data. You cannot display bad/erroneous data, but you shouldn’t throw it away. It’s amazing how old data can be useful in the most odd ways. Not the least of which is detecting misconfigurations.

Don’t get in the customer’s way. Assume they’ll want to use the data in at least one of three ways. First, they have no idea what they want so just let them explore. Second, they know what they want and they just want it once. Third, they know what they want and they want to pull it via an API. So we have designed the system to handle all three use cases.

Try to allow the user to quickly pivot based on the information found. If they know a domain they should be able to find subdomains quickly. If they know one domain they should be able to find correlated domains quickly, and so on. Everything should just be a few mouse clicks away. So our UI guru and Head of Engineering, Lex Arquette, built a really useful interface to allow exactly that.

It has to be blazing fast – ideally so fast that people think the demo is canned. That means probing the Internet after the fact isn’t a viable option, we’ve got to capture as much data up front as possible. That means capturing an enormous amount of data before anyone needs it. We’re talking about a massive upfront cost. It’s the “build it and they will come” approach to external Internet asset management. It also means we have to build, borrow or steal as much data as we can get our hands on, from as many places as we can find.

As we built the interface we quickly realized it was becoming more and more useful based on those design choices. Now we have an amazingly fast, thorough interface that can get most people up and running in about 2-3 minutes. With the vast portion of their inventory automatically built, without them having to do anything at all. That’s the design promise and the design choice. If you haven’t already seen it, please take a look. If you have see it but it’s been a while, you may want to take a second look. The new features we’ve added are changing the handbook for what corporate security, M&A, and Risk and Compliance need and use.

March 29, 2019

Asset Inventory — Lexicon, Glossary of Terminology

Post by Jeremiah Grossman

‘Asset Inventory’ is starting to catch on fast in Information Security. The reason for all the interest and market growth is simple: You cannot secure what you don’t know you own.

The reality is the vast majority of organizations simply do not have an inventory of their Internet-accessible assets, such as websites, name servers, mail servers, IoT devices, etc — or even their Intranet assets for that matter (desktops, printers, servers, etc). They don’t know where those assets are, what they do, who is responsible for them, or much of anything. As any security expert would agree, the lack of an asset inventory is a huge gap for any organization and is arguably the largest and most important unsolved problem in the industry.

And as we can expect from any new emerging industry, there will be a smattering of new technical terminology, some with conflicting and overlapping definitions, and a lot of redefining existing terms. Inevitably this causes a lot of confusion, which should be avoided. What’s needed is the start of a new lexicon for the asset inventory space in how knowledge is captured and communicated to others — a glossary of terms if you will.

Below I’ve drafted a starting list of the most common terms and what they mean. This will be a work in progress.


A domain name, subdomain, or IP addresses and/or combination thereof of a device connected to the Internet or internal network. An asset may include, but not limited to web servers, name servers, IoT devices, network printers, etc.

Example: foo.tld,, x.x.x.x

Asset Inventory

A complete collection of an organization’s assets and associated metadata of each asset.

Asset Management

Asset management refers to monitoring, configuring, and maintaining of assets.

Attack Surface

From the network perspective of an adversary, the complete asset inventory of an organization including all actively listening services (open ports) on each asset.


Discovery refers to the act of identifying assets.

Domain Name

A domain name is a label that identifies a network domain. Domain names are used to identify Internet resources, such as computers, networks, and services, with an easy-to-remember text label that is easier to memorize than the numerical addresses used in the Internet protocols.

Example: foo.tld is the domain name of URL


Refers to the accessibility of an asset that can be connected to from across the Internet.


A device connected to a network that communicates with other hosts on the network. 


A unique name given to any device that is connected to a specific computer network, typically appended to a domain name, and resolves to an IP-address using the Domain Name System (DNS).

Example: ‘bar’ is the hostname of


Refers to the accessibility of an asset that cannot be connected to from across the Internet, and generally resides on an internal network (i.e. Intranet).

Orphaned Hostname

A hostname that no longer resolves to an IP-address.

Internet-accessible, internet-connected, internet-facing

Refers to an asset that can be connected to over the Internet. While the terms above are often used interchangeably, Internet-accessible considered the preferred term.


A set of data that describes and gives information about an asset. Metadata may include, but not limited to geolocation, operation system, open ports, service banners, TLS certificate details, etc.

Reconnaissance / Recon

The act of finding assets

Routable / Non-Routable

Refers to a type of IP-address where network traffic can be routed to over the Internet. As defined by RFC-1918, there are certain IP-address ranges where network traffic cannot be routed to over the Internet, which are referred to as ‘non-routable’ IP-addresses or ‘private’ IP-space

Non-Routable IP-Addresses (RFC-1918) –  (10/8 prefix) –  (172.16/12 prefix) – (192.168/16 prefix)


A subdomain is a domain name with a hostname appended, which is sometimes more accurately described as a fully qualified domain name (FQDN).


Top-Level Domain (TLD)

Refers to the last segment of a domain name, the part following immediately after the “dot” symbol. The most common and familiar TLDs are .com, .net, and .org. 

Example: TLD is the Top-Level Domain name of the domain name

There are many other TLDs, such as and, which are technically not TLDs because they are not located at the ‘top level’ of the domain. These types of domains which are referred to as effective TLDs (eTLDS) because they serve a branching point for domain name registrars.

Virtual Host

Refers to a method for hosting multiple hostnames or domain names, with separate handling of each name, on a single server.