Skip to content

Recon Part 3.5 – HaveIBeenPwned?

This is a quick write-up on the amazing HaveIBeenPwned Database maintained by Troy Hunt. If you haven’t seen it, check it out!

I recently discovered there isa public API to query the breach databases and decided I wanted to notify employees at my company if their account was involved in the latest breaches. For any sysadmins or blue teamers out there, I encourage you to use this on your own organization after speaking with anyone who should be informed and use it to train your users and emphasize that password reuse is a huge risk. Below I will present a very simple script (made far more verbose than needed to improve readability for new coders). The walk-through follows.

The Code

import sys
import requests

import time
email_file =""
output_file = ""
verbose = ""
bquery = ""
params = ""
compromised_accounts = ""

    email_file = sys.argv[1]
    output_file = sys.argv[2]
    if len(sys.argv) > 3:
        verbose = sys.argv[3]
except Exception as e:
    print e

if verbose == "-v":
    params = "?truncateResponse=False"
    params = "?truncateResponse=True"

compromised_accounts = open(output_file, "w+")

print "email file is {e}".format(e=email_file)
print "output file is {e}".format(e=output_file)
if len(sys.argv) > 3:
    print "Running in verbose mode"
    print "Running in truncated mode"

with open(email_file, "r") as f:
        for email in f.readlines():
            email = email.rstrip()
            r = requests.get(bquery+email+params)
            if len(r.text) > 1:
                    print "{e} was invoved in these breaches: {b}".format(e=email, b=r.text)
                    write_data = "{e} has been in the following breaches! {b} \n".format(e=email, b=r.text)
                    print "{e} has not been involved in any tracked breaches.".format(e=email)

So, what does this do? At a high level, these 3 things happen:

  • Requires two arguments – source and destination files as well as an optional verbose option
  • The script opens the source file, reads the lines, and checks for a response on their API
  • It logs the output to terminal and an output file.

Great, so this will be very simple to go through. In regards to the inputs, the source file must exist and must contain line-separated email addresses.

Code Breakdown

Lines 1-4: Import modules we use, easy.

Lines 6-12: – Initial Variables. I’m setting these here so that you see where everything comes from/goes to, it would be more proper to do some/most of these more dynamically. Important variables to note are: params, bquery, and verbose. ‘bquery’ is the ‘breach query’ from the haveibeenpwned API – here. This returns an object if the email address is in a breach, it is empty if there is no data on that account. ‘params’ determines if the response is verbose or truncated, it will be appended to our API query. ‘verbose’ is the script input argument (-v) which sets ‘params’ to ?truncateResponse=False

Lines 14-28: Preparation. Lines 14-21 take the CLI arguments and turns them in to variables. You must supply arguments ‘1’ and ‘2’. Argument ‘3’ is optional). I put these in a try/except clause in order to troubleshoot any weirdness, not having at leastsys.argv[1] and sys.argv[2] will result in index errors, for instance. Lines 23-26 check the ‘verbose’ variable. If it is present the script sets our parms value to the verbose flag for the API. Line 28 opens our output file.

Lines 30-36: Print information to CLI. This is just for convenience for the user – maybe she put a filename without quotes and included a space, this will, hopefully, help troubleshoot that.

Lines 38-49: The stuff. This starts by opening our starting file,  looping through it as a list of lines, and then extracting just the email portion of each list item. We send a request to the haveibeenpwned api and the response gets stored in the variable ‘r’. If there is data in r.text() we know that the request was successful and we print out the user and breach information. This writes the data to a file. If there is no data in the response we print that the user was safe and move forward. Line 48 exists because the API rate-limits requests and requires a 1-second pause between requests. Line 49 gracefully closes our file.

What’s next?

That’s it for this post – I’m writing a much larger automated recon tool to incorporate a script similar to this that I’ll present in October at Toorcon. More information to come!

Next part here!

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *