Python requests with Proxy tutorial

Hi everyone! Ever wondered why your Python scripts using requests sometimes run into blocks, errors, or just don’t work as intended? That’s because many websites have protections in place to limit automated requests, detect suspicious activity, or enforce region-specific restrictions.

Why do you need a proxy?

Without a proxy, your script might be flagged, blocked, or unable to access certain content altogether.

But don’t worry—proxies are here to help! Let’s explore how to use a proxy on your Python requests.

We'll learn how to:
- use a proxy on Python requests
- use a proxy with user authentication
- use rotating proxies
- use a proxy with session

How to use Python Requests with a proxy

Here’s a step-by-step tutorial on how to use Python’s requests library with a proxy:

Step 1: Install the requests Library
Make sure you have the requests library installed. If not, install it using pip:

pip install requests

Step 2: Choose Your Proxy
You’ll need a proxy server. You can:

1. Find a Free Proxy: Some providers offer a limited free proxy. You can try to google "free proxy."

Using free proxies has a bigger chance of being blocked than a paid one. So, it's not recommended.

2. Use a Paid Proxy: Paid proxies are more reliable, faster, and secure.

Step 3: Define Your Proxy

Proxies can be set up for HTTP and HTTPS requests, but you aren't required to use both. In Python, this is done using a dictionary.

proxies = {
    "http": "http://your-proxy-ip:port",
    "https": "https://your-proxy-ip:port",
}

Replace your-proxy-ip:port with the actual IP address and port of your proxy server. For example:

proxies = {
    "http": "http://123.45.67.89:8080",
    "https": "https://123.45.67.89:8080",
}

Don't try to use that proxy; it's just an example.

Official Requests documentation on Proxy: https://requests.readthedocs.io/en/latest/user/advanced/#proxies

Step 4: Test the Proxy

Use the proxy with the requests.get method to test if it works:

import requests

proxies = {
    "http": "http://123.45.67.89:8080",
    "https": "https://123.45.67.89:8080",
}

try:
    response = requests.get("https://httpbin.org/ip", proxies=proxies)
    print(response.json())  # This should return the proxy's IP
except requests.exceptions.ProxyError as e:
    print("Proxy error:", e)

If successful, the output will show the proxy IP address instead of your local IP.

Example Output

Step 5: Use Proxies with Authentication (Optional)
Some proxies require a username and password. In that case, include the credentials in the proxy URL:

proxies = {
    "https": "https://username:password@proxy-ip:port",
}

Example:

proxies = {
    "https": "https://user123:pass123@123.45.67.89:8080",
}

Step 6: Rotate Proxies (Optional)
Using the same proxy can risk your proxy being flagged and eventually getting blocked by the website. To avoid getting blocked, you can rotate proxies by randomly selecting one from a list:

import requests
import random

proxy_list = [
    "https://123.45.67.89:8080",
    "https://98.76.54.32:8080",
]

proxies = {
    "https": random.choice(proxy_list),
}

response = requests.get("https://httpbin.org/ip", proxies=proxies)
print(response.json())

Step 7: Handle Proxy Errors Gracefully

Proxies might fail or be blocked. Just like a regular program, ensure to log your request properly to debug it later.

Handle exceptions to avoid crashes:

try:
    response = requests.get("https://example.com", proxies=proxies, timeout=5)
    print(response.status_code)
except requests.exceptions.ProxyError:
    print("Failed to connect to the proxy.")
except requests.exceptions.ConnectTimeout:
    print("Proxy timed out.")

Step 8: Use a proxy with session

You can enhance your web scraping effort by combining proxies with a requests.Session method for improved performance, persistent connections, and seamless handling of state across multiple requests. By reusing connections, you can keep things like cookies and settings consistent across requests.

import requests

proxies = {
  'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)

session.get('http://example.org')

Step 9: Use a Proxy with Advanced Requests

You can also use proxies with POST, PUT, or other HTTP methods:

data = {"key": "value"}
response = requests.post("https://example.com/api", data=data, proxies=proxies)
print(response.text)

Possible error when not using a Proxy

When performing HTTP requests in Python using the requests library without a proxy, you might encounter several types of errors depending on the specific scenario. Here are some common ones:

1. Blocked by the Server

If the server you're trying to access has security mechanisms (like rate limiting, geographic restrictions, or IP bans), you might get responses like:
- 403 Forbidden: The server is rejecting your request.
- 429 Too Many Requests: You are sending too many requests in a short period.

2. Connection Errors

If the server restricts access based on IP or geographical location:
- requests.exceptions.ConnectionError: Indicates that the connection could not be established.

3. Timeouts

If the server is slow to respond, or your IP is throttled:
- requests.exceptions.Timeout: Raised when a request exceeds the specified timeout duration.

4. DNS Resolution Issues

If your local DNS can't resolve the server's domain:
- requests.exceptions.ConnectionError with a message about DNS failure.

5. Captchas or Bot Detection

Some websites employ CAPTCHAs or bot-detection systems, leading to responses with:
- HTML for a CAPTCHA challenge (usually requires analysis of the response content to detect).

6. IP-Based Rate Limiting

Without proxies, all requests originate from your local IP address. High traffic might lead to:
- Temporary or permanent IP bans.
- Slower response times from the server.

7. SSL Verification Errors

If the server requires specific SSL configurations:
- requests.exceptions.SSLError: Indicates SSL handshake failures.

Example Error Handling Code

Here's how you can handle common errors when using the requests library:

import requests

try:
    response = requests.get("https://example.com", timeout=5)
    response.raise_for_status()  # Raise HTTPError for bad responses (4xx, 5xx)
except requests.exceptions.HTTPError as e:
    print(f"HTTP error occurred: {e}")
except requests.exceptions.ConnectionError as e:
    print(f"Connection error occurred: {e}")
except requests.exceptions.Timeout as e:
    print(f"Timeout occurred: {e}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

More error exceptions on requests: https://requests.readthedocs.io/en/latest/_modules/requests/exceptions/

On top of using proxies, you might need these other methods:

Headers/User-Agent Spoofing: To mimic a real browser.
Captcha Solvers: If the server uses CAPTCHA.
Session Management: To maintain cookies and sessions across requests.

Scraping Google with a proxy

Big companies like Google are much stricter on protecting their site from being scraped. Even using a paid proxy, you might face multiple layers of challenges before you can start scraping data from Google.

Sample error when scraping Google

Luckily, SerpApi offers a simple API for developers to scrape Google and other search engines like Bing, DuckDuckGo, Yahoo, etc. Using our APIs, you don't need to think about buying a proxy, solving Captchas and other headaches that come with scraping.

Register for free at serpapi.com.

Here is a quick way to scrape Google using SerpApi:

from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_SERPAPI_API_KEY",
  "engine": "google",
  "q": "Coffee",
}

search = GoogleSearch(params)
results = search.get_dict()

The other benefit of using SerpApi is controlling the geolocation when accessing a website.

FAQ

How do I get a proxy?
You can buy proxies from proxy providers or free proxy lists, but paid ones are more reliable.

When to use a proxy?
Use a proxy to hide your IP, bypass geo-restrictions, or avoid getting blocked while scraping.

What is proxy rotation?
Proxy rotation automatically switches proxies for each request to avoid detection and bans.

What is a residential proxy?
A residential proxy uses real IP addresses from real devices, making it harder to detect.