
How to get web page content and cookies of .onion sites in Python
November 5, 2021
A program that receives data from the Tor network must work with cookies, for example, in the case of a parser, it can be cURL, PHP script, Python script, and so on.
In the article “Web site parsing in command line” there is an example of working with cookies in cURL, but how to get the content of a web page (HTML code) and cookies of a Tor network site whose names end in .onion?
For the parser to work with the Tor network, you need to specify the data of the local Tor service (port number and “localhost” as IP) as a proxy for accessing the network.
For normal operation with .onion sites, you need to use the Tor DNS servers.
In the Python script, to access .onion sites, you need to use the socks5h protocol to enable the use of remote DNS to resolve hostnames to IP if local DNS resolution fails.
The following code shows the .onion page of the site (URL http://hacking5xcj4mtc63mfjqbshn3c5oa2ns7xgpiyrg2fenl2jd4lgooad.onion) and cookies:
import requests import json proxies = { 'http': 'socks5h://127.0.0.1:9050', 'https': 'socks5h://127.0.0.1:9050' } session = requests.Session() data = session.get("http://hacking5xcj4mtc63mfjqbshn3c5oa2ns7xgpiyrg2fenl2jd4lgooad.onion",proxies=proxies).text print(data) print(session.cookies)
A simple PHP script is used as a site that sends HTML code and cookies:
An example of how the code above works – you can see HTML and cookies:
Line
print(session.cookies)
Outputs:
< RequestsCookieJar[< Cookie HackWare-cookie=For%20testing%20purpose%20only for hacking5xcj4mtc63mfjqbshn3c5oa2ns7xgpiyrg2fenl2jd4lgooad.onion/ >] >
That is, the format is:
< RequestsCookieJar[< Cookie NAME=VALUE for SITE.onion/ >] >
If print (session.cookies) is changed to
print(session.cookies.get_dict())
then the format will be like this:
{'HackWare-cookie': 'For%20testing%20purpose%20only'}
Basically, sites can encrypt cookies. More precisely, in any case, cookies will be sent in the “NAME=VALUE” format. But the VALUE can be encrypted so that only the site will know what to do with it. But in general, the user does not need to think about it – what cookies were get, those are sent by the browser.
Related articles:
- How to Configure Tor to Work Through a Bridge Relay and Proxy in Windows (71.2%)
- How to run a program from Python: how to run a system command or another Python script (complete guide) (55.8%)
- How to download YouTube videos on Windows and Linux (GUI without third party services) (55.8%)
- Tor environment variables (55.8%)
- How to prevent Tor users from viewing or commenting on a WordPress site (55.8%)
- Why does the screen brightness of Samsung monitors randomly switch. How to get the brightest and most colorful picture on a Samsung monitor (RANDOM - 50%)