Bit Discovery Design Considerations

Robert Hansen April 9, 2019

When I started building the back-end infrastructure for Bit Discovery’s external asset inventory system – I learned a lot about the Internet that I only had a vague notion of before then. One of the most important things I learned is that Internet is… weird. Really, really weird. There are huge error rates in DNS, and there is little to no double checking of records so you often get totally non-compliant responses. RIRs require faxing or snail mail to get authorization. ICANN has no way to maintain continual access to records. It’s a cluster – and this is the basis for everything the Internet relies on.

However this brings us to an interesting realization – that the Internet sort of works anyway. For all of the duct tape and bailing wire we see in this industry it still manages to function, which boggles the mind. When I began constructing Bit Discovery’s back-end I had to come up with some design principles. The first of which is that I can’t make assumptions about what I’m seeing, I just have to be a reporter and display what I get. No matter how wrong it seems to be, as long as it is RFC compliant. So yes, public DNS records with RCF1918? Those can’t be altered no matter how stupid they seem.

As I started designing and writing code I came up with a number of design principles that went along those lines:

Don’t make assumptions about the data, just assume it’s correct. That doesn’t mean you can’t call out things that look incorrect/dangerous, but don’t assume I know more than whomever wrote it. It turns out RFC1918 is a good example – where it can be used to know about internal networks.

Don’t throw away data. You cannot display bad/erroneous data, but you shouldn’t throw it away. It’s amazing how old data can be useful in the most odd ways. Not the least of which is detecting misconfigurations.

Don’t get in the customer’s way. Assume they’ll want to use the data in at least one of three ways. First, they have no idea what they want so just let them explore. Second, they know what they want and they just want it once. Third, they know what they want and they want to pull it via an API. So we have designed the system to handle all three use cases.

Try to allow the user to quickly pivot based on the information found. If they know a domain they should be able to find subdomains quickly. If they know one domain they should be able to find correlated domains quickly, and so on. Everything should just be a few mouse clicks away. So our UI guru and Head of Engineering, Lex Arquette, built a really useful interface to allow exactly that.

It has to be blazing fast – ideally so fast that people think the demo is canned. That means probing the Internet after the fact isn’t a viable option, we’ve got to capture as much data up front as possible. That means capturing an enormous amount of data before anyone needs it. We’re talking about a massive upfront cost. It’s the “build it and they will come” approach to external Internet asset management. It also means we have to build, borrow or steal as much data as we can get our hands on, from as many places as we can find.

As we built the interface we quickly realized it was becoming more and more useful based on those design choices. Now we have an amazingly fast, thorough interface that can get most people up and running in about 2-3 minutes. With the vast portion of their inventory automatically built, without them having to do anything at all. That’s the design promise and the design choice. If you haven’t already seen it, please take a look. If you have see it but it’s been a while, you may want to take a second look. The new features we’ve added are changing the handbook for what corporate security, M&A, and Risk and Compliance need and use.