httpz- Hyper-fast HTTP Scraping Tool |
git clone git://git.acid.vegas/httpz.git |
Log | Files | Refs | Archive | README | LICENSE |
README.md (8233B)
1 # HTTPZ Web Scanner 2 3  4 5 A high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more. 6 7 ## Requirements 8 9 - [Python](https://www.python.org/downloads/) 10 - [aiohttp](https://pypi.org/project/aiohttp/) 11 - [beautifulsoup4](https://pypi.org/project/beautifulsoup4/) 12 - [cryptography](https://pypi.org/project/cryptography/) 13 - [dnspython](https://pypi.org/project/dnspython/) 14 - [mmh3](https://pypi.org/project/mmh3/) 15 - [python-dotenv](https://pypi.org/project/python-dotenv/) 16 17 ## Installation 18 19 ### Via pip *(recommended)* 20 ```bash 21 # Install from PyPI 22 pip install httpz_scanner 23 24 # The 'httpz' command will now be available in your terminal 25 httpz --help 26 ``` 27 28 ### From source 29 ```bash 30 # Clone the repository 31 git clone https://github.com/acidvegas/httpz 32 cd httpz 33 pip install -r requirements.txt 34 ``` 35 36 ## Usage 37 38 ### Command Line Interface 39 40 Basic usage: 41 ```bash 42 python -m httpz_scanner domains.txt 43 ``` 44 45 Scan with all flags enabled and output to JSONL: 46 ```bash 47 python -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p 48 ``` 49 50 Read from stdin: 51 ```bash 52 cat domains.txt | python -m httpz_scanner - -all -c 100 53 echo "example.com" | python -m httpz_scanner - -all 54 ``` 55 56 Filter by status codes and follow redirects: 57 ```bash 58 python -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p 59 ``` 60 61 Show specific fields with custom timeout and resolvers: 62 ```bash 63 python -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt 64 ``` 65 66 Full scan with all options: 67 ```bash 68 python -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt 69 ``` 70 71 ### Distributed Scanning 72 Split scanning across multiple machines using the `--shard` argument: 73 74 ```bash 75 # Machine 1 76 httpz domains.txt --shard 1/3 77 78 # Machine 2 79 httpz domains.txt --shard 2/3 80 81 # Machine 3 82 httpz domains.txt --shard 3/3 83 ``` 84 85 Each machine will process a different subset of domains without overlap. For example, with 3 shards: 86 - Machine 1 processes lines 0,3,6,9,... 87 - Machine 2 processes lines 1,4,7,10,... 88 - Machine 3 processes lines 2,5,8,11,... 89 90 This allows efficient distribution of large scans across multiple machines. 91 92 ### Python Library 93 ```python 94 import asyncio 95 import urllib.request 96 from httpz_scanner import HTTPZScanner 97 98 async def scan_from_list() -> list: 99 with urllib.request.urlopen('https://example.com/domains.txt') as response: 100 content = response.read().decode() 101 return [line.strip() for line in content.splitlines() if line.strip()][:20] 102 103 async def scan_from_url(): 104 with urllib.request.urlopen('https://example.com/domains.txt') as response: 105 for line in response: 106 if line := line.strip(): 107 yield line.decode().strip() 108 109 async def scan_from_file(): 110 with open('domains.txt', 'r') as file: 111 for line in file: 112 if line := line.strip(): 113 yield line 114 115 async def main(): 116 # Initialize scanner with all possible options (showing defaults) 117 scanner = HTTPZScanner( 118 concurrent_limit=100, # Number of concurrent requests 119 timeout=5, # Request timeout in seconds 120 follow_redirects=False, # Follow redirects (max 10) 121 check_axfr=False, # Try AXFR transfer against nameservers 122 resolver_file=None, # Path to custom DNS resolvers file 123 output_file=None, # Path to JSONL output file 124 show_progress=False, # Show progress counter 125 debug_mode=False, # Show error states and debug info 126 jsonl_output=False, # Output in JSONL format 127 shard=None, # Tuple of (shard_index, total_shards) for distributed scanning 128 129 # Control which fields to show (all False by default unless show_fields is None) 130 show_fields={ 131 'status_code': True, # Show status code 132 'content_type': True, # Show content type 133 'content_length': True, # Show content length 134 'title': True, # Show page title 135 'body': True, # Show body preview 136 'ip': True, # Show IP addresses 137 'favicon': True, # Show favicon hash 138 'headers': True, # Show response headers 139 'follow_redirects': True, # Show redirect chain 140 'cname': True, # Show CNAME records 141 'tls': True # Show TLS certificate info 142 }, 143 144 # Filter results 145 match_codes={200,301,302}, # Only show these status codes 146 exclude_codes={404,500,503} # Exclude these status codes 147 ) 148 149 # Example 1: Process file 150 print('\nProcessing file:') 151 async for result in scanner.scan(scan_from_file()): 152 print(f"{result['domain']}: {result['status']}") 153 154 # Example 2: Stream URLs 155 print('\nStreaming URLs:') 156 async for result in scanner.scan(scan_from_url()): 157 print(f"{result['domain']}: {result['status']}") 158 159 # Example 3: Process list 160 print('\nProcessing list:') 161 domains = await scan_from_list() 162 async for result in scanner.scan(domains): 163 print(f"{result['domain']}: {result['status']}") 164 165 if __name__ == '__main__': 166 asyncio.run(main()) 167 ``` 168 169 The scanner accepts various input types: 170 - File paths (string) 171 - Lists/tuples of domains 172 - stdin (using '-') 173 - Async generators that yield domains 174 175 All inputs support sharding for distributed scanning using the `shard` parameter. 176 177 ## Arguments 178 179 | Argument | Long Form | Description | 180 |---------------|------------------|-------------------------------------------------------------| 181 | `file` | | File containing domains *(one per line)*, use `-` for stdin | 182 | `-d` | `--debug` | Show error states and debug information | 183 | `-c N` | `--concurrent N` | Number of concurrent checks *(default: 100)* | 184 | `-o FILE` | `--output FILE` | Output file path *(JSONL format)* | 185 | `-j` | `--jsonl` | Output JSON Lines format to console | 186 | `-all` | `--all-flags` | Enable all output flags | 187 | `-sh` | `--shard N/T` | Process shard N of T total shards *(e.g., 1/3)* | 188 189 ### Output Field Flags 190 191 | Flag | Long Form | Description | 192 |--------| ---------------------|----------------------------------| 193 | `-sc` | `--status-code` | Show status code | 194 | `-ct` | `--content-type` | Show content type | 195 | `-ti` | `--title` | Show page title | 196 | `-b` | `--body` | Show body preview | 197 | `-i` | `--ip` | Show IP addresses | 198 | `-f` | `--favicon` | Show favicon hash | 199 | `-hr` | `--headers` | Show response headers | 200 | `-cl` | `--content-length` | Show content length | 201 | `-fr` | `--follow-redirects` | Follow redirects *(max 10)* | 202 | `-cn` | `--cname` | Show CNAME records | 203 | `-tls` | `--tls-info` | Show TLS certificate information | 204 205 ### Other Options 206 207 | Option | Long Form | Description | 208 |-------------|-------------------------|-----------------------------------------------------| 209 | `-to N` | `--timeout N` | Request timeout in seconds *(default: 5)* | 210 | `-mc CODES` | `--match-codes CODES` | Only show specific status codes *(comma-separated)* | 211 | `-ec CODES` | `--exclude-codes CODES` | Exclude specific status codes *(comma-separated)* | 212 | `-p` | `--progress` | Show progress counter | 213 | `-ax` | `--axfr` | Try AXFR transfer against nameservers | 214 | `-r FILE` | `--resolvers FILE` | File containing DNS resolvers *(one per line)* |