
theHarvester is an open-source tool developed in python. the objective of this tool is to gather emails, subdomains, hosts, employee names, open ports, and banners from different sources like search engines (Google, bing, duckduckgo, …)
this tool was made to help penetration testers in the early stage understand the customer footprint on the internet. this tool is also useful for anyone to know what an attacker can see about their organization.
Installation (theHarvester):
you have 3 options to install and use theHarvester tool
1. Kali Linux

In kali Linux theHarvester tool is installed by default, Make sure you are using a recent version. and you can simply run in by typing:
theHarvester -h
Read More about Collecting and analyze Instagram accounts Data
2. Virtualenv
download theHarvester
git clone https://github.com/laramies/theHarvester.git
change directory
cd theHarvester
Install Python3-virtualenv
sudo apt install python3-virtualenv
Create virtual env
python3 -m venv venv
load Virtual envorment
source venv/bin/activate
run
pip3 install -r requirements.txt
3. Docker
theHarvester tool can be used also with Docker
First, build the Docker image.
git clone https://github.com/laramies/theHarvester.git
cd theHarvester
docker build -t theHarvester
docker run theHarvester
How to use theHarvester:
[email protected]:~# theHarvester -h
*******************************************************************
* _ _ _ *
* | |_| |__ ___ /\ /\__ _ _ ____ _____ ___| |_ ___ _ __ *
* | __| _ \ / _ \ / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | | __/ / __ / (_| | | \ V / __/\__ \ || __/ | *
* \__|_| |_|\___| \/ /_/ \__,_|_| \_/ \___||___/\__\___|_| *
* *
* theHarvester 4.0.3 *
* Coded by Christian Martorella *
* Edge-Security Research *
* [email protected] *
* *
*******************************************************************
usage: theHarvester [-h] -d DOMAIN [-l LIMIT] [-S START] [-g] [-p] [-s]
[--screenshot SCREENSHOT] [-v] [-e DNS_SERVER]
[-t DNS_TLD] [-r] [-n] [-c] [-f FILENAME] [-b SOURCE]
theHarvester is used to gather open source intelligence (OSINT) on a company
or domain.
optional arguments:
-h, --help show this help message and exit
-d DOMAIN, --domain DOMAIN
Company name or domain to search.
-l LIMIT, --limit LIMIT
Limit the number of search results, default=500.
-S START, --start START
Start with result number X, default=0.
-g, --google-dork Use Google Dorks for Google search.
-p, --proxies Use proxies for requests, enter proxies in
proxies.yaml.
-s, --shodan Use Shodan to query discovered hosts.
--screenshot SCREENSHOT
Take screenshots of resolved domains specify output
directory: --screenshot output_directory
-v, --virtual-host Verify host name via DNS resolution and search for
virtual hosts.
-e DNS_SERVER, --dns-server DNS_SERVER
DNS server to use for lookup.
-t DNS_TLD, --dns-tld DNS_TLD
Perform a DNS TLD expansion discovery, default False.
-r, --take-over Check for takeovers.
-n, --dns-lookup Enable DNS server lookup, default False.
-c, --dns-brute Perform a DNS brute force on the domain.
-f FILENAME, --filename FILENAME
Save the results to an XML and JSON file.
-b SOURCE, --source SOURCE
anubis, baidu, bing, binaryedge, bingapi,
bufferoverun, censys, certspotter, crtsh, dnsdumpster,
duckduckgo, fullhunt, github-code, google,
hackertarget, hunter, intelx, linkedin,
linkedin_links, n45ht, omnisint, otx, pentesttools,
projectdiscovery, qwant, rapiddns, rocketreach,
securityTrails, spyse, sublist3r, threatcrowd,
threatminer, trello, twitter, urlscan, virustotal,
yahoo, zoomeye
let’s search from email addresses from a domain (-d ubuntu.com), and limit the results to 500 (-l 500), using Google (-b google):
[email protected]:~# theharvester -d ubuntu.com -l 500 -b google
*******************************************************************
* *
* | |_| |__ ___ /\ /\__ _ _ ____ _____ ___| |_ ___ _ __ *
* | __| '_ \ / _ \ / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | | __/ / __ / (_| | | \ V / __/\__ \ || __/ | *
* \__|_| |_|\___| \/ /_/ \__,_|_| \_/ \___||___/\__\___|_| *
* *
* TheHarvester Ver. 3.0.0 *
* Coded by Christian Martorella *
* Edge-Security Research *
* [email protected] *
*******************************************************************
[-] Starting harvesting process for domain: ubuntu.com
[-] Searching in Google:
Searching 0 results...
Searching 100 results...
Searching 200 results...
Searching 300 results...
Searching 400 results...
Searching 500 results...
Harvesting results
[...]
Passive Discovery:
- anubis: Anubis-DB – https://github.com/jonluca/anubis
- baidu: Baidu search engine – www.baidu.com
- binaryedge: placeholder – www.binaryedge.io
- bing: Microsoft search engine – www.bing.com
- bingapi: Microsoft search engine, through the API (Requires an API key, see below.)
- bufferoverun: Uses data from Rapid7’s Project Sonar – www.rapid7.com/research/project-sonar/
- censys: Censys search engine, will use certificates searches to enumerate subdomains and gather emails (Requires an API key, see below.) – censys.io
- certspotter: Cert Spotter monitors Certificate Transparency logs – https://sslmate.com/certspotter/
- crtsh: Comodo Certificate search – https://crt.sh
- dnsdumpster: DNSdumpster search engine – https://dnsdumpster.com
- duckduckgo: DuckDuckGo search engine – www.duckduckgo.com
- fullhunt: The Next-Generation Attack Surface Security Platform – https://fullhunt.io
- github-code: GitHub code search engine (Requires a GitHub Personal Access Token, see below.) – www.github.com
- google: Google search engine (Optional Google dorking.) – www.google.com
- hackertarget: Online vulnerability scanners and network intelligence to help organizations – https://hackertarget.com
- hunter: Hunter search engine (Requires an API key, see below.) – www.hunter.io
- intelx: Intelx search engine (Requires an API key, see below.) – www.intelx.io
- linkedin: Google search engine, specific search for LinkedIn users – www.linkedin.com
- linkedin_links: specific search for LinkedIn users for target domain (Uses Google search.)
- n45ht: – https://n45ht.or.id
- omnisint: Project Crobat, A Centralised Searchable Open Source Project Sonar DNS Database – https://github.com/Cgboal/SonarSearch
- otx: AlienVault Open Threat Exchange – https://otx.alienvault.com
- pentesttools: Powerful Penetration Testing Tools, Easy to Use (Requires an API key, see below.) – https://pentest-tools.com/home
- projecdiscovery: We actively collect and maintain internet-wide assets data, to enhance research and analyse changes around DNS for better insights (Requires an API key, see below.) – https://chaos.projectdiscovery.io
- qwant: Qwant search engine – www.qwant.com
- rapiddns: DNS query tool which make querying subdomains or sites of a same IP easy! https://rapiddns.io
- rocketreach: Access real-time verified personal/professional emails, phone numbers, and social media links. – https://rocketreach.co
- securityTrails: Security Trails search engine, the world’s largest repository of historical DNS data
(Requires an API key, see below.) – www.securitytrails.com - shodan: Shodan search engine, will search for ports and banners from discovered hosts (Requires an API key, see below.) – www.shodanhq.com
- spyse: Spyse is a search engine built for a quick cyber intelligence of IT infrastructures, networks, and even the smallest parts of the internet. (Requires an API key, see below.) – spyse.com
- sublist3r: Fast subdomains enumeration tool for penetration testers – https://api.sublist3r.com/search.php?domain=example.com
- threatcrowd: Open source threat intelligence – www.threatcrowd.org
- threatminer: Data mining for threat intelligence – https://www.threatminer.org/
- trello: Search trello boards (Uses Google search.)
- twitter: Twitter accounts related to a specific domain (Uses Google search.)
- urlscan: A sandbox for the web that is a URL and website scanner – https://urlscan.io
- vhost: Bing virtual hosts search
- virustotal: virustotal.com domain search
- yahoo: Yahoo search engine
- zoomeye: China version of shodan – https://www.zoomeye.org
Active Discovery:
- DNS brute force: dictionary brute force enumeration
- Screenshots: Take screenshots of subdomains that were found
start using theHarvester: https://github.com/laramies/theHarvester