Collecting Sensitive Information like an Ethical Hacker

in #programming8 years ago

An unthinkable number of credentials and sensitive information gets dumped into the wide web every minute. Hackers often paste the results of their attacks/exploits into the searchable web. Others, white hat hackers and experienced penetration testers, send the sensitive information to parties that could manage it appropriately. And others collect or receive such information as soon as it becomes available.

One such party is Dump Monitor. Created in 2013 by security researcher Jordan Wright, Dump Monitor's twitter handle dumpmon provides links to pastes containing information from potential data breaches. You can read how Dump Monitor was created here.

You could simply follow dumpmon's twitter and check through the links they tweet every couple of minutes. I think that's too time consuming...

Looking through such sensitive information that is publicly available is not wrongdoing. First of all, you did not get the information yourself, you did not publish it, and unless you use it for malicious purposes, there is nothing wrong with you accessing it.

Looking through this type of information is often categorized as open source intelligence gathering (OSINT). According to the White Paper, open source intelligence:

"is intelligence derived from public information - tailored intelligence which is based on information which can be obtained legally and ethically from public sources."

Open sources for intelligence:

  • newspapers, radio, TV, magazines, etc.
  • web-based: social networks, blogs, wikis, etc.
  • public government records.
  • geospatial information.
  • deep web.
  • and many others.

In this post I'm going to show you how you can use Python programming to create an automated tool that looks over dumpmon and downloads all the information dumps in local text files.


Gathering Sensitive Information with Python

What you need:

Explanation of the algorithm:

  • I authenticate with my twitter credentials (I cannot parse twitter data through the API otherwise)
  • I look over dumpmon at twitter
  • I get their first 20 tweets
  • I retrieve and save the links as local files

Here's the code:

import tweepy
from tweepy import OAuthHandler
import re
import urllib

consumer_key = 'your twitter consumer key'
consumer_secret = 'consumer secret key'
access_token = 'your access token'
access_secret = 'your access secret'

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

urls = []
i=0

for event in api.user_timeline('dumpmon', count=20):
    stri = event.text
    m = re.match('([^\s]+)', stri)
    urls.append(m.group(0))

for url in urls:
    try:
        url = urllib.request.urlretrieve(url, 'dump-%s'%str(i)+'.txt')
        i=i+1
    except:
        continue

Dump Monitor tweets all pastes and data breach dumps in a very standardized format.

The above algorithm works with this standardized format and it parses the links out of it, and attaches them to a list:

for event in api.user_timeline('dumpmon', count=20):
    stri = event.text
    m = re.match('([^\s]+)', stri)
    urls.append(m.group(0))

Then it accesses the links and saves them locally as text. Here's what I get after running the algorithm:

Some files contain plain-text (unhashed and unencrypted) credentials:


What you can Do - Be of Service!

Before giving you ideas, I have to say that this algorithm can be modified in numerous ways. One would be to have it look for specific 'keywords' (your email?) in these data-breaches and save only the files containing those keywords.

If you decide to use this algorithm, you should do it with good intentions in mind. You could look into the dumps and try to alert victims of the information/credential leak about what happened. You could play the investigator. Heck, you could even turn this into a paid job...


To stay in touch with me, follow @cristi

#security #programming


Cristi Vlad, Self-Experimenter and Author

Sort:  

@crisit, good stuff. A little scary how easily accessible people's information is in the digital age. This is something that could be a job, we have real life police. Digital police for hire.

@cristi :) yeah, information is in plain sight.

I wish to know so much info like you! Great work Cristi!

hey, thanks! but relative to others, I'm nobody ;)

Coin Marketplace

STEEM 0.20
TRX 0.13
JST 0.030
BTC 66631.72
ETH 3487.54
USDT 1.00
SBD 2.71