Project

General

Profile

Tutorial functionalities » History » Version 47

« Previous - Version 47/50 (diff) - Next » - Current version
Frédéric Guihéry, 11/07/2015 01:33 PM


Tutorial: discover Netzob features

As a reminder, Netzob is an open source tool for reverse engineering, traffic generation and fuzzing of communication protocols. It allows to infer the message format and state machine of a protocol through passive and active processes. The model can afterward be used to simulate realistic and controllable trafic as well as to fuzz a target implementation.

Through this tutorial, we will present the main features of Netzob regarding the inference of message formats and grammar of a simple toy protocol, and some basic fuzzing of the implementation at the end. The described features cover the following capabilities:

  • Import of a PCAP file
  • Format message inference
    • Partitionment of messages following a specific delimiter
    • Regroupment of messages following a specific key field
    • Partitionment of a subset a each message following a sequence aligment
    • Search for relationships in each group of messages
    • Modification of the format message to apply found relationships
  • Grammar inference
    • Generation of an automaton with one main state according to a captured sequence of messages
    • Generation of an automaton with a sequence of states according to a captured sequence of messages
    • Generation of a Prefix Tree Acceptor (PTA) automaton according to a captured sequence of messages
  • Traffic generation and fuzzing
    • Generation of messages following the inferred message format of each group and through visiting the inferred automata
    • Fuzzing of an implementation by generating altered message formats

Intall Netzob and download the tutorial resources

At first, retrieve the source code of Netzob, install its dependencies and compile the underlying libraries.
If required, more details on the installation process are provided in the README file.

$ git clone https://dev.netzob.org/git/netzob
$ cd ./netzob/
$ sudo apt-get install python python-dev python-impacket python-setuptools build-essential python-numpy
$ python setup.py build
$ python setup.py develop --user

Then, you can retrieve the source code of the toy protocol implementation used in this tutorial, as well as some PCAP files of sequences of messages.

In the followings, the article goes through the different steps that can be followed to reverse this toy protocol.
Anyway, before diving into Netzob features you can have a look at its documentation and especially the description of the API.

Import messages from a PCAP file

The first step in most Protocol Reverse Engineering (PRE) processes is to collect and import communication samples. In this tutorial, samples take the form of PCAP files.
Reading packets from a PCAP file is done through the PCAPImporter.readFile() static function. This function can optionally take more parameters to specify a BPF filter, the import layer or the number of packets to capture, as shown in the documentation:

def readFile(filePath, bpfFilter="", importLayer=5, nbPackets=0):
     """Read all messages from the specified PCAP file. A BPF filter
     can be set to limit the captured packets. The layer of import
     can also be specified:
      - When layer={1, 2}, it means we want to capture a raw layer (such as Ethernet).
      - If layer=3, we capture at the network level (such as IP).
      - If layer=4, we capture at the transport layer (such as TCP or UDP).
      - If layer=5, we capture at the applicative layer (such as the TCP or UDP payload).
     Finally, the number of packets to capture can be specified.

    :param filePath: the pcap path
    :type filePath: :class:`str`
    :param bpfFilter: a string representing a BPF filter.
    :type bpfFilter: :class:`str`
    :param importLayer: an integer representing the protocol layer to start importing.
    :type importLayer: :class:`int`
    :param nbPackets: the number of packets to import
    :type nbPackets: :class:`int`
    :return: a list of captured messages
    :rtype: a list of :class:`netzob.Common.Models.Vocabulary.Messages.AbstractMessage`

This function can be used to extract the messages from the PCAPs we collected while stimulating our toy protocol implementation.
For example, the following code creates a symbol (i.e. a group of messages) based on the messages extracted out of the PCAPs.

from netzob.all import *

messages_session1 = PCAPImporter.readFile("target_src_v1_session1.pcap").values()
messages_session2 = PCAPImporter.readFile("target_src_v1_session2.pcap").values()
messages = messages_session1 + messages_session2

symbol = Symbol(messages = messages)
print symbol

Field                                                
-----------------------------------------------------
'CMDidentify#\x07\x00\x00\x00Roberto'                
'RESidentify#\x00\x00\x00\x00\x00\x00\x00\x00'       
'CMDinfo#\x00\x00\x00\x00'                           
'RESinfo#\x00\x00\x00\x00\x04\x00\x00\x00info'       
'CMDstats#\x00\x00\x00\x00'                          
'RESstats#\x00\x00\x00\x00\x05\x00\x00\x00stats'     
'CMDauthentify#\n\x00\x00\x00aStrongPwd'             
'RESauthentify#\x00\x00\x00\x00\x00\x00\x00\x00'     
'CMDencrypt#\x06\x00\x00\x00abcdef'                  
"RESencrypt#\x00\x00\x00\x00\x06\x00\x00\x00$ !&'$"  
"CMDdecrypt#\x06\x00\x00\x00$ !&'$"                  
'RESdecrypt#\x00\x00\x00\x00\x06\x00\x00\x00abcdef'  
'CMDbye#\x00\x00\x00\x00'                            
'RESbye#\x00\x00\x00\x00\x00\x00\x00\x00'            
'CMDidentify#\x04\x00\x00\x00fred'                   
'RESidentify#\x00\x00\x00\x00\x00\x00\x00\x00'       
'CMDinfo#\x00\x00\x00\x00'                           
'RESinfo#\x00\x00\x00\x00\x04\x00\x00\x00info'       
'CMDstats#\x00\x00\x00\x00'                          
'RESstats#\x00\x00\x00\x00\x05\x00\x00\x00stats'     
'CMDauthentify#\t\x00\x00\x00myPasswd!'              
'RESauthentify#\x00\x00\x00\x00\x00\x00\x00\x00'     
'CMDencrypt#\n\x00\x00\x00123456test'                
"RESencrypt#\x00\x00\x00\x00\n\x00\x00\x00spqvwt6'16" 
"CMDdecrypt#\n\x00\x00\x00spqvwt6'16"                
'RESdecrypt#\x00\x00\x00\x00\n\x00\x00\x00123456test'
'CMDbye#\x00\x00\x00\x00'                            
'RESbye#\x00\x00\x00\x00\x00\x00\x00\x00'            
-----------------------------------------------------

Regroup messages in a symbol and do a format partitionment with a delimiter

According to a quick review of the displayed messages, the character '#' sounds interesting as it appears in the middle of each message. Thus, a first step in our inference process would be to split each message according to the delimiter '#'.
As stated in the documentation, the function splitDelimiter() plays this role :

def splitDelimiter(field, delimiter):
    """Split a field (or symbol) with a specific delimiter. The
    delimiter can be passed either as an ASCII, a Raw, an
    HexaString, or any objects that inherit from AbstractType.

    :param field : the field to consider when spliting
    :type: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    :param delimiter : the delimiter used to split messages of the field
    :type: :class:`netzob.Common.Models.Types.AbstractType.AbstractType`

So let's use the delimiter '#' with the function splitDelimiter(). We can latter display the obtained field structure with the "_str_debug" method.

Format.splitDelimiter(symbol, ASCII("#"))

print "[+] Symbol structure:" 
print symbol._str_debug()

[+] Symbol structure:
Symbol
|--  Field-0
     |--   Alt
           |--   Data (Raw='CMDidentify' ((0, 88)))
           |--   Data (Raw='RESidentify' ((0, 88)))
           |--   Data (Raw='CMDinfo' ((0, 56)))
           |--   Data (Raw='RESinfo' ((0, 56)))
           |--   Data (Raw='CMDstats' ((0, 64)))
           |--   Data (Raw='RESstats' ((0, 64)))
           |--   Data (Raw='CMDauthentify' ((0, 104)))
           |--   Data (Raw='RESauthentify' ((0, 104)))
           |--   Data (Raw='CMDencrypt' ((0, 80)))
           |--   Data (Raw='RESencrypt' ((0, 80)))
           |--   Data (Raw='CMDdecrypt' ((0, 80)))
           |--   Data (Raw='RESdecrypt' ((0, 80)))
           |--   Data (Raw='CMDbye' ((0, 48)))
           |--   Data (Raw='RESbye' ((0, 48)))
|--  Field-sep-23
     |--   Alt
           |--   Data (ASCII=# ((0, 8)))
           |--   Data (Raw=None ((0, 0)))
|--  Field-2
     |--   Alt
           |--   Data (Raw='\x07\x00\x00\x00Roberto' ((0, 88)))
           |--   Data (Raw='\x00\x00\x00\x00\x00\x00\x00\x00' ((0, 64)))
           |--   Data (Raw='\x00\x00\x00\x00' ((0, 32)))
           |--   Data (Raw='\x00\x00\x00\x00\x04\x00\x00\x00info' ((0, 96)))
           |--   Data (Raw='\x00\x00\x00\x00\x05\x00\x00\x00stats' ((0, 104)))
           |--   Data (Raw='\n\x00\x00\x00aStrongPwd' ((0, 112)))
           |--   Data (Raw='\x06\x00\x00\x00abcdef' ((0, 80)))
           |--   Data (Raw="\x00\x00\x00\x00\x06\x00\x00\x00$ !&'$" ((0, 112)))
           |--   Data (Raw="\x06\x00\x00\x00$ !&'$" ((0, 80)))
           |--   Data (Raw='\x00\x00\x00\x00\x06\x00\x00\x00abcdef' ((0, 112)))
           |--   Data (Raw='\x04\x00\x00\x00fred' ((0, 64)))
           |--   Data (Raw='\t\x00\x00\x00myPasswd!' ((0, 104)))
           |--   Data (Raw='\n\x00\x00\x00123456test' ((0, 112)))
           |--   Data (Raw="\x00\x00\x00\x00\n\x00\x00\x00spqvwt6'16" ((0, 144)))
           |--   Data (Raw="\n\x00\x00\x00spqvwt6'16" ((0, 112)))
           |--   Data (Raw='\x00\x00\x00\x00\n\x00\x00\x00123456test' ((0, 144)))

Regarding the partitioned messages, this now looks like this:

print "[+] Partitionned messages:" 
print symbol
'CMDidentify'   | '#' | '\x07\x00\x00\x00Roberto'                 
'RESidentify'   | '#' | '\x00\x00\x00\x00\x00\x00\x00\x00'        
'CMDinfo'       | '#' | '\x00\x00\x00\x00'                        
'RESinfo'       | '#' | '\x00\x00\x00\x00\x04\x00\x00\x00info'    
'CMDstats'      | '#' | '\x00\x00\x00\x00'                        
'RESstats'      | '#' | '\x00\x00\x00\x00\x05\x00\x00\x00stats'   
'CMDauthentify' | '#' | '\n\x00\x00\x00aStrongPwd'                
'RESauthentify' | '#' | '\x00\x00\x00\x00\x00\x00\x00\x00'        
'CMDencrypt'    | '#' | '\x06\x00\x00\x00abcdef'                  
'RESencrypt'    | '#' | "\x00\x00\x00\x00\x06\x00\x00\x00$ !&'$"  
'CMDdecrypt'    | '#' | "\x06\x00\x00\x00$ !&'$"                  
'RESdecrypt'    | '#' | '\x00\x00\x00\x00\x06\x00\x00\x00abcdef'  
'CMDbye'        | '#' | '\x00\x00\x00\x00'                        
'RESbye'        | '#' | '\x00\x00\x00\x00\x00\x00\x00\x00'        
'CMDidentify'   | '#' | '\x04\x00\x00\x00fred'                    
(...)    

Cluster according to a key field

The first field seems interesting, as it contains some kind of commands ('CMDencrypt', 'CMDidentify', etc.). Let's thus cluster (i.e. group messages that have the same value for the first field) the symbol according to the first field. We use the function clusterByKeyField(), that has the following description:

def clusterByKeyField(field, keyField):
    """Create and return new symbols according to a specific key
    field.

    :param field: the field we want to split in new symbols
    :type field: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    :param keyField: the field used as a key during the splitting operation
    :type field: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    :raise Exception if something bad happens

Here, we use the function clusterByKeyField to generate a list of symbols from the captured messages:

symbols = Format.clusterByKeyField(symbol, symbol.fields[0])

print "[+] Number of symbols after clustering: {0}".format(len(symbols))
print "[+] Symbol list:" 
for keyFieldName, s in symbols.items():
    print "  * {0}".format(keyFieldName)

The clustering algorithm produces 14 different symbols, where each symbol has a unique value in the first field.

[+] Number of symbols after clustering: 14
[+] Symbol list:
  * RESdecrypt
  * RESbye
  * RESidentify
  * CMDbye
  * RESencrypt
  * CMDidentify
  * RESstats
  * CMDencrypt
  * RESauthentify
  * CMDdecrypt
  * CMDinfo
  * CMDauthentify
  * RESinfo
  * CMDstats

Apply a format partitionment with a sequence alignment on the third field of each symbol

As the last field seems to have a dynamic size, let's have a look at what would provide a sequence alignment (i.e. a means to align static and dynamic sub-fields). We use the function splitAligned(), that has the following documentation:

def splitAligned(field, useSemantic=True, doInternalSlick=False):
    """Split the specified field according to the variations of message bytes.
    Relies on a sequence alignment algorithm.
    (...)

In the following excerpt, we want to align the last field of each symbol through a sequence alignment algorithm:

for symbol in symbols.values():
    Format.splitAligned(symbol.fields[2], doInternalSlick=True)
    print "[+] Partitionned messages:" 
    print symbol

For the symbol 'CMDencrypt', the sequence alignment of the last field produces the following format, where we can observe a static field of '\x00\x00\x00' surrounded by two variable fields. The last field seems to be the buffer we want to encrypt, as the key field name suggest (i.e. 'CMDencrypt').

(...)
[+] Partitionned messages:
'CMDencrypt' | '#' | '\n'   | '\x00\x00\x00' | '123456test'
'CMDencrypt' | '#' | '\x06' | '\x00\x00\x00' | 'abcdef'   
(...)

Find field relations in each symbol

Let's now find any relationships is those messages. The Netzob API provides the function RelationFinder.findOnSymbol, that allows to identify potential relationships in message fields that pertain to the same symbol, as described in the documentation:

def findOnSymbol(symbol):
    """Find exact relations between fields in the provided
    symbol/field.

    :param symbol: the symbol in which we are looking for relations
    :type symbol: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    """ 

The following example shows how to use the relationships finder on our unknown protocol and how to handle the results:

for symbol in symbols.values():
    rels = RelationFinder.findOnSymbol(symbol)

    print "[+] Relations found: " 
    for rel in rels:
        print "  " + rel["relation_type"] + ", between '" + rel["x_attribute"] + "' of:" 
        print "    " + str('-'.join([f.name for f in rel["x_fields"]]))
        p = [v.getValues()[:] for v in rel["x_fields"]]
        print "    " + str(p)
        print "  " + "and '" + rel["y_attribute"] + "' of:" 
        print "    " + str('-'.join([f.name for f in rel["y_fields"]]))
        p = [v.getValues()[:] for v in rel["y_fields"]]
        print "    " + str(p)

In the symbol 'CMDencrypt', we have found a relationship between the content of a field (the third one) and the length of another field (the last one, which presumably contains the buffer we want to encrypt).

(...)
[+] Relations found: 
  SizeRelation, between 'value' of:
    Field
    [['\n', '\x06']]
  and 'size' of:
    Field
    [['123456test', 'abcdef']]
(...)

Find relations and apply them in the symbol structure

We then modify the format message to apply the relationship we have just found, by creating a Size field whose value depends on the content of a targeted field. We also specify a factor that basically says that the value of the size field should be one eighth of the size of the buffer field (as every field size is expressed in bits by default).

for symbol in symbols.values():
    rels = RelationFinder.findOnSymbol(symbol)

    for rel in rels:

        # Apply first found relationship
        rel = rels[0]
        rel["x_fields"][0].domain = Size(rel["y_fields"], factor=1/8.0)

    print "[+] Symbol structure:" 
    print symbol._str_debug()

The 'CMDencrypt' symbol structure now looks like this:

(...)
[+] Symbol structure:
Symbol_CMDencrypt
|--  Field-0
     |--   Data (ASCII=CMDencrypt ((0, 80)))
|--  Field-sep-23
     |--   Data (ASCII=# ((0, 8)))
|--  Field-2
     |--   Data (Raw=None ((0, None)))
|--  |--  Field
          |--   Size(['Field']) - Type:Raw=None ((8, 8))
|--  |--  Field
          |--   Data (Raw='\x00\x00\x00' ((0, 24)))
|--  |--  Field
          |--   Data (Raw=None ((0, 80)))
(...)

That's all for the message format inference. Let's now look at the state machine of this toy protocol.

Generate a chained states automaton

The first part of the tutorial focused on the reverse engineering of the message formats of the protocol. We will now work on the reverse engineering of the state machine, i.e. the grammar that tells the authorized sequences of messages/symbols. In this tutorial, we generate three kinds of automata by learning the observed sequences of messages. A sequence of messages is represented in Netzob by an object Session. Moreover, when working with symbols (which are an abstraction of a group of similar messages), a sequence of abstracted messages is represented by an abstract session. This object is thus used to infer state machines.

Based on the symbols we have learned, we will generate a basic automaton that illustrates the sequence of commands and responses extracted from a PCAP file. For each message sent, this will create a new transition to a new state, thus the name of chained states automaton.

# Create a session of messages
session = Session(messages_session1)

# Abstract this session according to the inferred symbols
abstractSession = session.abstract(symbols.values())

# Generate an automata according to the observed sequence of messages/symbols
automata = Automata.generateChainedStatesAutomata(abstractSession, symbols.values())

# Print the dot representation of the automata
dotcode = automata.generateDotCode()
print dotcode

The obtained automaton is finally converted into Dot code in order to render a graphical version of it.

digraph G {
"Start state" [shape=doubleoctagon, style=filled, fillcolor=white, URL="f8d33b83-d6b0-4180-832c-7cce9d6b3fea"];
"State 1" [shape=ellipse, style=filled, fillcolor=white, URL="a332ed56-e2d8-4c8c-9ec2-99c5f942e9a3"];
"State 2" [shape=ellipse, style=filled, fillcolor=white, URL="8f45bd4e-fe03-4a26-bf9a-1adec60f597d"];
"State 3" [shape=ellipse, style=filled, fillcolor=white, URL="01999e79-de00-467d-987a-e9411d57be99"];
"State 4" [shape=ellipse, style=filled, fillcolor=white, URL="9b20ed29-77e5-43c1-bb8b-cf3a84674941"];
"State 5" [shape=ellipse, style=filled, fillcolor=white, URL="52ec3815-656b-421b-bb1f-c4f7746be534"];
"State 6" [shape=ellipse, style=filled, fillcolor=white, URL="1cbbd123-32d5-4cd8-bd01-4fd3bcd8ae38"];
"State 7" [shape=ellipse, style=filled, fillcolor=white, URL="8a8ab662-db23-4206-ba35-28396ee31115"];
"State 8" [shape=ellipse, style=filled, fillcolor=white, URL="ee9e0d5d-bb4e-4d2e-8c97-1553afa1cc68"];
"End state" [shape=ellipse, style=filled, fillcolor=white, URL="3874e4e9-af5d-428e-92b8-e1fda38b6ef9"];
"Start state" -> "State 1" [fontsize=5, label="OpenChannelTransition", URL="4beecca4-0d48-4ca9-8d83-ffd8766b64c7"];
"State 1" -> "State 2" [fontsize=5, label="Transition (Symbol_CMDidentify;{Symbol_RESidentify})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 2" -> "State 3" [fontsize=5, label="Transition (Symbol_CMDinfo;{Symbol_RESinfo})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 3" -> "State 4" [fontsize=5, label="Transition (Symbol_CMDstats;{Symbol_RESstats})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 4" -> "State 5" [fontsize=5, label="Transition (Symbol_CMDauthentify;{Symbol_RESauthentify})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 5" -> "State 6" [fontsize=5, label="Transition (Symbol_CMDencrypt;{Symbol_RESencrypt})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 6" -> "State 7" [fontsize=5, label="Transition (Symbol_CMDdecrypt;{Symbol_RESdecrypt})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 7" -> "State 8" [fontsize=5, label="Transition (Symbol_CMDbye;{Symbol_RESbye})", URL="c4e5451c-6a53-41f3-9748-7179774eb7de"];
"State 8" -> "End state" [fontsize=5, label="CloseChannelTransition", URL="c6ac87b7-5de1-401a-8b75-5d2a73d81264"];
}

Generate a one state automaton

This time, instead of converting a PCAP into a sequence of states for each message observed, we generate a unique state that accept any of the observed sent messages to trigger a new transition. In response to each sent message (for example 'CMDencrypt'), we expect a specific response (for example 'REDencrypt').

# Create a session of messages
session = Session(messages_session1)

# Abstract this session according to the inferred symbols
abstractSession = session.abstract(symbols.values())

# Generate an automata according to the observed sequence of messages/symbols
automata = Automata.generateOneStateAutomata(abstractSession, symbols.values())

# Print the dot representation of the automata
dotcode = automata.generateDotCode()
print dotcode

The obtained automaton is finally converted into Dot code in order to render a graphical version of it.

digraph G {
"Start state" [shape=doubleoctagon, style=filled, fillcolor=white, URL="0659071e-1849-4616-a11a-e98edfe86e24"];
"Main state" [shape=ellipse, style=filled, fillcolor=white, URL="424e0a69-da0b-4030-816a-8368e30a00a9"];
"End state" [shape=ellipse, style=filled, fillcolor=white, URL="9de3d54b-f0eb-45f8-809a-86a60d22812f"];
"Start state" -> "Main state" [fontsize=5, label="OpenChannelTransition", URL="3818118b-97db-474f-b9c3-f38c04152a74"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDidentify;{Symbol_RESidentify})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDinfo;{Symbol_RESinfo})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDstats;{Symbol_RESstats})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDauthentify;{Symbol_RESauthentify})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDencrypt;{Symbol_RESencrypt})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDdecrypt;{Symbol_RESdecrypt})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "Main state" [fontsize=5, label="Transition (Symbol_CMDbye;{Symbol_RESbye})", URL="f6000e04-10a8-41de-a1a0-29021440684a"];
"Main state" -> "End state" [fontsize=5, label="CloseChannelTransition", URL="75a4cc3a-72a4-42a3-af2c-aa3939f899aa"];
}

Generate a PTA-based automaton

Finally, we convert multiple sequences of messages taken from different PCAP files to generate an automaton for which we have merge identical paths. The underlying merging strategy is called a Prefix-Tree Acceptor.

# Create sessions of messages
messages_session1 = PCAPImporter.readFile("target_src_v1_session1.pcap").values()
messages_session3 = PCAPImporter.readFile("target_src_v1_session3.pcap").values()

session1 = Session(messages_session1)
session3 = Session(messages_session3)

# Abstract this session according to the inferred symbols
abstractSession1 = session1.abstract(symbols.values())
abstractSession3 = session3.abstract(symbols.values())

# Generate an automata according to the observed sequence of messages/symbols
automata = Automata.generatePTAAutomata([abstractSession1, abstractSession3], symbols.values())

# Print the dot representation of the automata
dotcode = automata.generateDotCode()
print dotcode

The obtained automaton is finally converted into Dot code in order to render a graphical version of it.

digraph G {
"Start state" [shape=doubleoctagon, style=filled, fillcolor=white, URL="e46d8a67-2a96-479a-9234-c1b38c75b847"];
"State 0" [shape=ellipse, style=filled, fillcolor=white, URL="0cd8a2c9-4410-45a0-9950-6456546f49dc"];
"State 1" [shape=ellipse, style=filled, fillcolor=white, URL="bbc10d50-f197-40f6-a674-5f80790ef954"];
"State 2" [shape=ellipse, style=filled, fillcolor=white, URL="739801b7-9e0d-4fba-a4f5-cf130e6b7fbf"];
"State 3" [shape=ellipse, style=filled, fillcolor=white, URL="c2075b80-16b9-4bd7-b290-6eb333f94e43"];
"State 4" [shape=ellipse, style=filled, fillcolor=white, URL="715ede75-d81e-46ea-a7c1-f537e5dba892"];
"State 9" [shape=ellipse, style=filled, fillcolor=white, URL="ad5873af-c26a-482f-94d9-0cf47c69376b"];
"State 10" [shape=ellipse, style=filled, fillcolor=white, URL="01859f7d-6b43-45af-8c17-9decb10dea9b"];
"End state 11" [shape=ellipse, style=filled, fillcolor=white, URL="7f4bd693-a35f-479b-8e86-128dc46c71cf"];
"State 5" [shape=ellipse, style=filled, fillcolor=white, URL="ee9da65c-b072-4344-bf71-2d67a3b73880"];
"State 6" [shape=ellipse, style=filled, fillcolor=white, URL="902e76e4-6a9a-45a2-95ba-ae9484f1084f"];
"State 7" [shape=ellipse, style=filled, fillcolor=white, URL="f7e9b27a-6879-4b4f-bb51-00530f07addf"];
"End state 8" [shape=ellipse, style=filled, fillcolor=white, URL="fe710eed-287f-4abf-93bf-6878e487d8a9"];
"Start state" -> "State 0" [fontsize=5, label="OpenChannelTransition", URL="5d6139d0-9b1c-49b2-b19d-91ae8c56f299"];
"State 0" -> "State 1" [fontsize=5, label="Transition (Symbol_CMDidentify;{Symbol_RESidentify})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 1" -> "State 2" [fontsize=5, label="Transition (Symbol_CMDinfo;{Symbol_RESinfo})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 2" -> "State 3" [fontsize=5, label="Transition (Symbol_CMDstats;{Symbol_RESstats})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 3" -> "State 4" [fontsize=5, label="Transition (Symbol_CMDauthentify;{Symbol_RESauthentify})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 4" -> "State 5" [fontsize=5, label="Transition (Symbol_CMDencrypt;{Symbol_RESencrypt})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 4" -> "State 9" [fontsize=5, label="Transition (Symbol_CMDdecrypt;{Symbol_RESdecrypt})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 9" -> "State 10" [fontsize=5, label="Transition (Symbol_CMDbye;{Symbol_RESbye})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 10" -> "End state 11" [fontsize=5, label="CloseChannelTransition", URL="f7ddbccf-93b6-4496-a153-5b2306d95dac"];
"State 5" -> "State 6" [fontsize=5, label="Transition (Symbol_CMDdecrypt;{Symbol_RESdecrypt})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 6" -> "State 7" [fontsize=5, label="Transition (Symbol_CMDbye;{Symbol_RESbye})", URL="a1d2d03d-8c58-4c83-afa1-c40433fbd833"];
"State 7" -> "End state 8" [fontsize=5, label="CloseChannelTransition", URL="f7ddbccf-93b6-4496-a153-5b2306d95dac"];
}

Generate messages according to the inferred model

We now have a pretty good knowledge of the format messsage and grammar of the targeted protocol. Let's thus play with this model, by trying to communicate with a real server implementation.

At first, let's start the server in order to discuss with it.

$ cd src_v1/
$ ./server

Ready to read incomming messages

(...)

Then, we create a UDP client that will communicate with the server (on 127.0.0.1:4242) by exchanging messages generated from the infered symbols. In Netzob, a actor is a high-level representation that participate in a communication with a remote peer. This actor is able to send and receive data that respects the state machine (the Automata) as well as the message formats (the Symbols) of a previously learned protocol. In order to convert symbols into concrete messages, or in order to convert received concrete messages into symbols, an abstraction layer is used. This component ensure the specialization of sent symbols and the abstraction of received messages.

# Create a UDP client instance
channelOut = UDPClient(remoteIP="127.0.0.1", remotePort=4242)
abstractionLayerOut = AbstractionLayer(channelOut, symbols.values())
abstractionLayerOut.openChannel()

# Visit the automata for n iteration
state = automata.initialState
for n in xrange(8):
    state = state.executeAsInitiator(abstractionLayerOut)

We go through 8 iterations in the automaton.

1454: [INFO] AbstractionLayer:openChannel: Going to open the communication channel...
1454: [INFO] AbstractionLayer:openChannel: Communication channel opened.
1454: [INFO] State:executeAsInitiator: Next transition: Open.
1454: [INFO] AbstractionLayer:openChannel: Going to open the communication channel...
1454: [INFO] AbstractionLayer:openChannel: Communication channel opened.
1454: [INFO] State:executeAsInitiator: Transition 'Open' leads to state: State 1.
1455: [INFO] State:executeAsInitiator: Next transition: Transition.
1455: [INFO] AbstractionLayer:writeSymbol: Going to specialize symbol: 'Symbol_CMDidentify' (id=dbea29b9-7e9f-4c2b-be14-625f675569f3).
1455: [INFO] AbstractionLayer:writeSymbol: Data generated from symbol 'Symbol_CMDidentify': 'CMDidentify#\x03\x00\x00\x00\xfc{\xdb'.
1456: [INFO] AbstractionLayer:writeSymbol: Going to write to communication channel...
1456: [INFO] AbstractionLayer:writeSymbol: Writing to commnunication channel donne..
1456: [INFO] AbstractionLayer:readSymbol: Going to read from communication channel...
1456: [INFO] AbstractionLayer:readSymbol: Received data: ''RESidentify#\x00\x00\x00\x00\x00\x00\x00\x00''
1457: [INFO] AbstractionLayer:readSymbol: Received symbol on communication channel: 'Symbol_RESidentify'
1457: [INFO] Transition:executeAsInitiator: Possible output symbol: 'Symbol_RESidentify' (id=49c24e1c-3751-412e-9f6a-f006a7de7492).
1457: [INFO] State:executeAsInitiator: Transition 'Transition' leads to state: State 2.
1457: [INFO] State:executeAsInitiator: Next transition: Transition.
1457: [INFO] AbstractionLayer:writeSymbol: Going to specialize symbol: 'Symbol_CMDinfo' (id=5eb47a57-eccf-4d06-8231-0b1ae87f96a7).
1458: [INFO] AbstractionLayer:writeSymbol: Data generated from symbol 'Symbol_CMDinfo': 'CMDinfo#\x00\x00\x00\x00'.
1458: [INFO] AbstractionLayer:writeSymbol: Going to write to communication channel...
1458: [INFO] AbstractionLayer:writeSymbol: Writing to commnunication channel donne..
1458: [INFO] AbstractionLayer:readSymbol: Going to read from communication channel...
1458: [INFO] AbstractionLayer:readSymbol: Received data: ''RESinfo#\x00\x00\x00\x00\x04\x00\x00\x00info''
1462: [INFO] AbstractionLayer:readSymbol: Received symbol on communication channel: 'Symbol_RESinfo'
1462: [INFO] Transition:executeAsInitiator: Possible output symbol: 'Symbol_RESinfo' (id=b41502e3-21ea-4cb9-9c1e-dc171f715685).
1462: [INFO] State:executeAsInitiator: Transition 'Transition' leads to state: State 3.
1462: [INFO] State:executeAsInitiator: Next transition: Transition.
(...)

Regarding the real server, we can see that received messages are well formated, as the server is able to parse them and send correct responses.

$ ./server 

Ready to read incomming messages
-> Read: CMDidentify#.
   Command: CMDidentify
   Arg size: 2
   Arg content: ..
<- Send: 
   Return value: 0
   Size of data buffer: 0
   Data buffer: 
    "" 

-> Read: CMDinfo#
   Command: CMDinfo
   Arg size: 0
<- Send: 
   Return value: 0
   Size of data buffer: 4
   Data buffer: 
   DATA: 69 6e 66 6f                                         "info" 

-> Read: CMDstats#
   Command: CMDstats
   Arg size: 0
<- Send: 
   Return value: 0
   Size of data buffer: 5
   Data buffer: 
   DATA: 73 74 61 74 73                                      "stats" 

-> Read: CMDauthentify#.
   Command: CMDauthentify
   Arg size: 6
   Arg content: ......
<- Send: 
   Return value: 0
   Size of data buffer: 0
   Data buffer: 
    "" 

-> Read: CMDencrypt#.
   Command: CMDencrypt
   Arg size: 2
   Arg content: ..
<- Send: 
(...)

Do some fuzzing on a specific symbol

Finally, we voluntarily twist the format message of the 'CMDencrypt' symbol, in order to try some fuzzing. The format modification corresponds to an extension of the size of the buffer field (i.e. the one which receives the data to encrypt).

def send_and_receive_symbol(symbol):
    data = symbol.specialize()
    print "[+] Sending: {0}".format(repr(data))
    channelOut.write(data)
    data = channelOut.read()
    print "[+] Receiving: {0}".format(repr(data))

# Update symbol definition to allow a broader payload size
symbols["CMDencrypt"].fields[2].fields[2].domain = Raw(nbBytes=(10, 120))

for i in range(10):
    send_and_receive_symbol(symbols["CMDencrypt"])

We can see that Netzob is only sending CMDencrypt messages with a potentially long last field:

[+] Sending: 'CMDencrypt#6\x00\x00\x00&\xe0*\xb3\xa8A(\x0b\xd2yA\xb5\xb8\rw\x0fGi\xee\xb3\xd6\xb0<\xfc\xc0\xa7m\xbd\xbc\xde2~\xceE\xe5\xda@\xd4\xed\xed\xf2\xb4\xe7\t\xfbC\xbf\x05\xc6\xce\xfb\x83\xf2\x00'
(...)

In the server part, we quickly get a segmentation fault, due to a bug in the parsing of the last field.

$ gdb ./server
(gdb) run
Starting program: /home/fgy/travaux/netzob/git/netzob-resources/experimentations/tutorial_target/src_v1/server 

Ready to read incomming messages
(...)
-> Read: CMDencrypt#6
   Command: CMDencrypt
   Arg size: 54
   Arg content: &?*??A(
wGi???<???m???2~?E??@????????    ?C??

Program received signal SIGSEGV, Segmentation fault.
0x08048bc0 in api_encrypt (in=0x45ce7e32 <Address 0x45ce7e32 out of bounds>, len=3561020133, out=0xb4f2eded <Address 0xb4f2eded out of bounds>) at amo_api.c:80
80          tmpData[i] = (in[i] ^ key) % 0xff;

That's all folks for this introduction tutorial. You can get the entire source code of the script used to infer and play with the protocol:

We invite you to read the API documentation or talk with us on IRC (#netzob on Freenode) if you have any question.

automata_target_v1_chained.svg View (11 KB) Frédéric Guihéry, 05/12/2015 12:25 AM

automata_target_v1_onestate.svg View (8.98 KB) Frédéric Guihéry, 05/12/2015 12:25 AM

automata_target_v1_pta.svg View (14.5 KB) Frédéric Guihéry, 05/12/2015 10:53 PM

target_src_v1_session1.pcap (1.09 KB) Frédéric Guihéry, 05/12/2015 10:54 PM

target_src_v1_session2.pcap (1.1 KB) Frédéric Guihéry, 05/12/2015 10:54 PM

target_src_v1_session3.pcap (952 Bytes) Frédéric Guihéry, 05/12/2015 10:54 PM

tutorial_netzob_v1.tar.gz (4.21 KB) Frédéric Guihéry, 07/19/2015 11:10 PM

target_src_v1_session3.pcap (952 Bytes) Frédéric Guihéry, 07/19/2015 11:10 PM

target_src_v1_session2.pcap (1.1 KB) Frédéric Guihéry, 07/19/2015 11:10 PM

target_src_v1_session1.pcap (1.09 KB) Frédéric Guihéry, 07/19/2015 11:10 PM

inference_target_src_v1.py View (4.73 KB) Frédéric Guihéry, 07/19/2015 11:14 PM

Also available in: PDF HTML TXT