Packet Sniffing with Python
As part of my efforts to write my own packet sniffer, I wanted to take a look at handling raw packet data. I am hoping to eventually evolve this script into a tool to actually modify packet data before it is sent out. Fortunately, the Internet is vast and I found some great tutorials and example scripts to help me on my way. This post documents the steps I used to write my own basic packet sniffer in Python 2.7.
In it’s simplest form, socket programming is connecting two nodes (aka sockets) on a network so that they can communicate with each other. These sockets can be categorized into two groups: the client and server. The server listens on one port and IP address. The client sends requests to the server, prompting the server to respond.
Sockets can be implemented on any of a number of different channels. Of primary interest to me are UDP and TCP so my focus will be on these protocols.
To create a socket you first import the socket module and then define your socket:
import socket s = socket.socket(<socket_family>, <socket_type>)
You can follow a tutorial found here to set up a basic server/client model with socket. When you run your server it sets up a listener on your localhost on whatever port you set it to (I set mine to 1337). I then ran lsof to show that my server socket is listening on TCP port 1337:
When I run my client socket, it connects to the server. Once connected, the server sends the client a status message which the client prints out:
IP Packet Breakdown
You need to understand the innards of an IP packet in order to be able to successfully process it. If you recall the OSI Model, data sent over the network contains several layers of encapsulation. My Python script will be handling data being sent over layer 3 with IPv4 (Internet Protocol). An IP packet of this flavor can be broken into higher protocols/layers as shown:
Here TCP is layer 4 (it could be swapped with UDP, another layer 4 protocol). The application layer is layer 7 in the OSI model.
IPv4 handles fragmenting and reassembly packets as well as error reporting. Below shows the IP packet header:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields of particular interest to me here are the source and destination address. For a detailed explanation on the IP header and each of its fields checkout RFC 791.
While the IP header contains useful information, I can dig further into the layers to get more information. The next header in the packet is TCP, a connection-oriented layer 4 protocol. It’s header is shown below:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
For a detailed look at the TCP header and its fields checkout RFC 793.
I am primarily interested in the ports in this header. I could dig even deeper into the packet to pull out more information. However, there are a lot of different protocols/applications that can be run on TCP. Handling all these variations can get pretty complicated. Frankly, there are already packet sniffers out there that handle processing these layers fairly well. So for this script I really only care about IP and TCP.
I was lucky to find a great tutorial for processing IP packets here. I will walk though some of Silver Moon’s examples since they helped me wrap my head around what I was doing. I encourage you to check out Silver Moon’s tutorial as it covers additional socket configurations as well as Ethernet frames, UDP and ICMP headers which I haven’t included in this post. My final script uses Silver Moon’s processing algorithm (to convert hex into the header values).
Capturing Raw Network Data
Below is a basic script for just capturing raw data being received your host.
import socket # Create INET, RAW socket s = socket.socket(socket.AF_INET, socket.SOCKET_RAW, socket.IPPROTO_TCP) # Print Captured Packets while True: print s.recvfrom(80)
As you can see I currently have an apache webserver running on port 80:
I can start my sniffer and then send a connection request to my apache webserver via netcat:
My packet sniffer then outputs the traffic it receives from the socket:
You can see that at the moment this is just raw hex data. We need to process it to get any meaning out of it.
Processing the Data
Now that you have the raw data captured you can start parsing out data of interest from each header. You know the first two fields of the packet are the IP version (4 bits so the first hex digit) and IHL (IP header length) which is another 4 bits (or the next hex digit). Once you have the header length you can jump to the next IP header and begin the same process for parsing out the desired values there. Once you have the lengths of both headers (from the field values) you can grab “everything else” which per the TCP header is the “data” field.
Now when I run my packet sniffer and establish a connection with my apache server I get much more interesting output:
The code for the packet sniffer script:
import socket, sys from struct import * # create an INET raw socket try: s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_TCP) except socket.error, msg: print 'Socket could not be created. Error Code: ' + str(msg) + ' Message ' + msg sys.exit() # receive a packet while True: packet = s.recvfrom(80) #packet string from tuple packet = packet #take first 20 characters for the IP header ip_header = packet[0:20] #now unpack them iph = unpack('!BBHHHBBH4s4s', ip_header) version_ihl = iph version = version_ihl >> 4 ihl = version_ihl & 0xF iph_length = ihl * 4 ttl = iph protocol = iph s_addr = socket.inet_ntoa(iph) d_addr = socket.inet_ntoa(iph) print '-----------------------PACKET-----------------------' print "IP Header Info | TCP Header Info" ip_info = ['Version: ' + str(version), 'IP Header Length: ' + str(ihl), 'TTL: ' + str(ttl), 'Protocol: ' + str(protocol), 'Source Address: ' + str(s_addr), 'Dest Address: ' + str(d_addr)] tcp_header = packet[iph_length:iph_length+20] #now unpack them tcph = unpack('!HHLLBBHHH', tcp_header) source_port = tcph dest_port = tcph sequence = tcph acknowledgement = tcph doff_reserved = tcph tcph_length = doff_reserved >> 4 tcp_info = ['Source Port: ' + str(source_port), 'Destination Port: ' + str(dest_port), 'Sequence: ' + str(sequence), 'Acknowledgement: ' + str(acknowledgement), 'TCP Header Length: ' + str(tcph_length)] h_size = iph_length + tcph_length * 4 data_size = len(packet) - h_size #get data from the packet data = packet[h_size:] #print ip and tcp header info for i in range(0,len(ip_info)): tcp_data = "" if i < len(tcp_info): tcp_data = tcp_info[i] print ip_info[i] + " "*(28-len(ip_info[i])) + "| " + tcp_data print 'Data: ' + data print '-----------------------PACKET-----------------------'
You can also find it on github.
Throughout my 10 year career I have worked as a web developer, systems administrator, software engineer, security analyst and now cybersecurity engineer. I currently develop software applications to automate security vulnerability and compliance scanning and reporting for a multinational financial institution.