How to use proxies with Python Requests module

Sending HTTP requests in Python is not necessarily easy. We have built-in modules like urllib, urllib2 to deal with HTTP requests. Also, we have third-party tools like Requests. Many developers use Requests because it is high level and designed to make it extremely easy to send HTTP requests.

But choosing the tool which is most suitable for your needs is just one thing. In the web scraping world, there are many obstacles we need to overcome. One huge challenge is when your scraper gets blocked. To solve this problem, you need to use proxies. In this article, I’m going to show you how to utilize proxies when using the Requests module so your scraper will not get banned.

Requests and proxies

Basic usage

import requests proxies = {
"http": "http://10.10.10.10:8000",
"https": "http://10.10.10.10:8000",
}
r = requests.get("http://toscrape.com", proxies=proxies)

The proxies dictionary must follow this scheme. It is not enough to define only the proxy address and port. You also need to specify the protocol. You can use the same proxy for multiple protocols. If you need authentication use this syntax for your proxy:

http://user:pass@10.10.10.10:8000

Environment variables

export HTTP_PROXY="http://10.10.10.10:8000" 
export HTTPS_PROXY="http://10.10.10.10:1212"

This way you don’t need to define any proxies in your code. Just make the request and it will work.

Proxy with session

import requests 
s = requests.Session()
s.proxies = {
"http": "http://10.10.10.10:8000",
"https": "http://10.10.10.10:8000",
}
r = s.get("http://toscrape.com")

IP rotating

To be able to rotate IPs, we first need to have a pool of IP addresses. We can use free proxies that we can find on the internet or we can use commercial solutions for this. Be aware, that if your product/service relies on scraped data a free proxy solution will probably not be enough for your needs. If a high success rate and data quality are important for you, you should choose a paid proxy solution like Zyte Smart Proxy Manager (formerly Crawlera).

IP rotation with Requests

ip_addresses = ["85.237.57.198:44959", "116.0.2.94:43379", "186.86.247.169:39168", "185.132.179.112:1080", "190.61.44.86:9991"]

Then, we can randomly pick a proxy to use for our request. If the proxy works properly we can access the given site. If there’s a connection error we might want to delete this proxy from the list and retry the same URL with another proxy.

try:
proxy_index = random.randint(0, len(ip_addresses) - 1)
proxy = {"http": ip_addresses(proxy_index), "https": ip_addresses(proxy_index)}
requests.get(url, proxies=proxies)
except:
# implement here what to do when there's a connection error
# for example: remove the used proxy from the pool and retry the request using another one

There are multiple ways you can handle connection errors. Because sometimes the proxy that you are trying to use is just simply banned. In this case, there’s not much you can do about it other than removing it from the pool and retrying using another proxy. But other times if it isn’t banned you just have to wait a little bit before using the same proxy again.

Implementing your own smart proxy solution which finds the best way to deal with errors is very hard to do. That’s why you should consider using a managed solution, like Zyte Smart Proxy Manager, to avoid all the unnecessary pains with proxies.

Using Zyte Smart Proxy Manager (formerly Crawlera) with Requests

import requests 
url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = ":"
proxies = {
"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)
}
r = requests.get(url, proxies=proxies, verify=False)

What does this piece of code do? It sends a successful HTTP request. When you use Zyte Smart Proxy Manager, you don’t need to deal with proxy rotation manually. Everything is taken care of internally.

If you find that managing proxies on your own is too complex and you’re looking for an easy solution, give Zyte Smart Proxy Manager (formerly Crawlera) a try.

Hi, we’re Zyte, the central point of entry for all your web data needs.