How to get web page content and cookies of .onion sites in Python

A program that receives data from the Tor network must work with cookies, for example, in the case of a parser, it can be cURL, PHP script, Python script, and so on.

In the article “Web site parsing in command line” there is an example of working with cookies in cURL, but how to get the content of a web page (HTML code) and cookies of a Tor network site whose names end in .onion?

For the parser to work with the Tor network, you need to specify the data of the local Tor service (port number and “localhost” as IP) as a proxy for accessing the network.

For normal operation with .onion sites, you need to use the Tor DNS servers.

In the Python script, to access .onion sites, you need to use the socks5h protocol to enable the use of remote DNS to resolve hostnames to IP if local DNS resolution fails.

The following code shows the .onion page of the site (URL http://hacking5xcj4mtc63mfjqbshn3c5oa2ns7xgpiyrg2fenl2jd4lgooad.onion) and cookies:

import requests
import json

proxies = {
	'http': 'socks5h://',
	'https': 'socks5h://'

session = requests.Session()

data = session.get("http://hacking5xcj4mtc63mfjqbshn3c5oa2ns7xgpiyrg2fenl2jd4lgooad.onion",proxies=proxies).text



A simple PHP script is used as a site that sends HTML code and cookies:

An example of how the code above works – you can see HTML and cookies:




< RequestsCookieJar[< Cookie HackWare-cookie=For%20testing%20purpose%20only for hacking5xcj4mtc63mfjqbshn3c5oa2ns7xgpiyrg2fenl2jd4lgooad.onion/ >] >

That is, the format is:

< RequestsCookieJar[< Cookie NAME=VALUE for SITE.onion/ >] >

If print (session.cookies) is changed to


then the format will be like this:

{'HackWare-cookie': 'For%20testing%20purpose%20only'}

Basically, sites can encrypt cookies. More precisely, in any case, cookies will be sent in the “NAME=VALUE” format. But the VALUE can be encrypted so that only the site will know what to do with it. But in general, the user does not need to think about it – what cookies were get, those are sent by the browser.

Leave Your Observation

Your email address will not be published. Required fields are marked *