HTTP in Python
To create and HTTP connection we use the httplib module which provides some simple yet effective methods to create a connection and send requests to an HTTP server. The module urllib can also be used and is easier, does, however only perform GET requests (explained in a future post).
Creating a connection
To create a connection, the method HTTPConnection does the job. For example:
import httplib
try:
http = httplib.HTTPConnection('almightybuserror.com')
except:
print 'Could not connect...'
return
Sending a request
To send a request, a connection object is needed (we’ll use http from the previous example):
http.request("GET", "/")
Getting a response
To obtain a response, the getResponse() method is used. From this method we get a HTTPResponse object.
response = http.getResponse()
if response.status == httplib.OK:
file = open('example.html', 'w')
file.write(response.read())
file.close()
else:
print 'Error: %s %s' % (response.status, response.reason)
return
For a more complete example check downloader.py in github.
HTTP Proxy: Java Implementation
Continuing from “Concept Introduction” we will now see how a possible implementation is done. To start implementing a proxy we must first create a simple server application which accepts a connection from a client to be redirected.
Creating Server
We start by creating a ServerSocket (for more information go here).
doIt( new ServerSocket( serverPort ) );
Now we need to implement the method doIt() which simply uses a non-stopping cycle to take requests.
public static void doIt( ServerSocket server )
{
for(;;)
{
try
{
Socket request = server.accept();
new RequestHandler( request ).start();
}
catch( IOException e )
{ e.printStackTrace(); continue; }
}
}
Creating a RequestHandler
The object RequestHandler is the one that actually does all the work for one request. Because of that we thread it (more information here) so that the proxy can handle multiple requests at once. The utility functions used here are located at the end of the post.
The code to convert the request to HTTP/1.0 is as follows:
InputStream srcIs = source.getInputStream();
String req = readLine(srcIs);
Scanner head = new Scanner(req);
String request = head.next();
String[] link = parseUrl(head.next());
if(link[0].compareTo("http") != 0)
throw new IOException("Incompatible Protocol");
request += String.format(" %s HTTP/1.0\r\n", link[3]);
String[] currentProperty = parseHttpHeader( readLine(srcIs) );
while(currentProperty != null)
{
if(isBanned(currentProperty))
{
currentProperty = parseHttpHeader( readLine(srcIs) );
continue;
}
request += String.format("%s:%s\r\n", currentProperty[0],
currentProperty[1]);
currentProperty = parseHttpHeader( readLine(srcIs) );
}
request += "\r\n"; // Add last CRLF
The function isBanned is the filter for properties from HTTP/1.1:
private boolean isBanned(String[] property)
{
return property[0].compareTo("Connection") == 0 ||
property[0].compareTo("Keep-Alive") == 0 ||
property[0].compareTo("Proxy-Connection") == 0;
}
Now that the request is ready, the only thing needed to be done, is to send the request and then dump the whole response from the server to the client.
InetAddress host = InetAddress.getByName(link[1]);
Socket destination = new Socket(host, Integer.parseInt(link[2]) );
OutputStream destOs = destination.getOutputStream();
InputStream destIs = destination.getInputStream();
destOs.write(request.getBytes());
dumpStream(destIs, source.getOutputStream());
destination.close();
source.close();
Note: source is a socket that is passed in the RequestHandler constructor.
Utility Functions
- The parseHttpHeader function parses the properties that a normal HTTP request has so we can easily remove the properties that are exclusive to HTTP1.1.
public static String[] parseHttpHeader( String header )
{
String[] result = new String[2];
int pos0 = header.indexOf( ':' );
if( pos0 == -1 )
return null;
result[0] = header.substring( 0, pos0 ).trim();
result[1] = header.substring( pos0 + 1 ).trim();
return result;
}
- The readLine function keeps reading from a InputStream until it reaches a “\r\n”.
public static String readLine( InputStream is ) throws IOException
{
StringBuffer sb = new StringBuffer();
int c;
while( (c = is.read() ) >= 0 ) {
if( c == '\r' ) continue;
if( c == '\n' ) break ;
sb.append( new Character( (char)c ) );
}
return sb.toString();
}
- The parseUrl function parses a header, for example “http://almightybuserror.com:5000/stuff/here/” will be transformed to: [“http”, “almightybuserror.com”, “5000”, “/stuff/here/”].
public static String[] parseUrl(String url)
{
String result[] = new String[4];
int i = url.indexOf(':');
result[0] = url.substring(0, i); // Protocol
result[1] = url.substring(i+3);
i = result[1].indexOf(':'); // Get possibly a port number
int j = result[1].indexOf('/');
// To parse the rest of the request
result[2] = (i > 0 ? result[1].substring(i+1, j) : "80");
result[3] = result[1].substring(j); // Request
result[1] = (i > 0 ? result[1].substring(0, i) :
result[1].substring(0, j));
//Concat so only the host remains
return result;
}
- The dumpStream function will also be used, you can find it here.
Thank you for reading. Comments are welcome!
Note: Credits for the functions readLine and parseHttpHeader go to ASC Department.
HTTP Proxy: Concept introduction
An HTTP proxy is a program that acts as a seamless server (since the user does not notice its existence under normal circumstances) and forwards HTTP (Hyper Text Transfer Protocol) traffic.
A problem for people that are creating a proxy for learning purposes is that most browsers nowadays days use HTTP/1.1 which is far more complex than its HTTP/1.0 counterpart implementation-wise. For this very reason we will convert every HTTP/1.1 requests to HTTP/1.0.
For a browser using HTTP/1.0 to fetch a webpage, it first needs to establish a connection with the webserver and use a “GET” command on the index (or another page if specified). When it finishes receiving and processing that page, the browser identifies every resource the page uses (be it images, scripts or Cascading Style Sheets). For each resource it needs, the browser creates a connection and sends a request. The webserver closes the connection after it finishes sending the resource.
The proxy will work like the following:

A browser using HTTP/1.1 does, however use pipelining and persistent connections, which after requesting and receiving the webpage the server does not close the connection, and can keeps receiving requests from the browser for resources avaiable locally until the client closes the connection (by using the Connection property with keep-alive value).
For the proxy to convert the request headers to HTTP/1.0 it is needed to read and modify them, removing some fields from the HTTP/1.1 request.
For our purposes we will remove “Connection” and the non-obligatory “Keep-Alive” and “Proxy-Connection” properties since they are unique to HTTP/1.1 and change the protocol version from the header. Every line of a HTTP request and response ends using two special characters “\r\n”.
Example:
GET http://almightybuserror.com/ HTTP/1.1
Connection: keep-alive
And this last request gets converted and sent to the server “almightybuserror.com” as:
GET / HTTP/1.0
Note: There needs to exist an extra “\r\n” at the end of the request since it is the only way the server involved knows that the request has ended.
Thank for reading. Comments are welcome.