Thursday, April 17, 2014

We now look at the optimization techniques with the Berkeley Packet filter in TcpDump. Optimizing the code generation was required because users could specify compound queries. In such queries the generated code was redundant and highly inefficient. As a comparison, if we had taken a simple query such as tcp src port 100, then the resulting code would have been a linear decision tree with stteps evaluating down from IP, frag, TCP, sport, 100, true with each progress down only on true.
IF we wanted the same for both directions, we would have two such trees  with an OR in between  and evaluating one below the other. If the query had been both in same directions such as with tcp port 100 and 200, then we would have both trees one evaluating to the other.  and this combination evaluating one below the other. In this case, we would have had a significant code bloat. This is highly redundant and inefficient.  This is where the optimization techniques are used which have their roots in compiler design. One such technique is called the dominator technique. What this does, it eliminates common subexpressions.  So if the exit from one is the same as entry to another, we can replace that sub-expression with a third. While this traditional technique does optimize the code, it does not address the branching because data could flow to this sub-expression from both . If we look at the edge relationships instead of node relationships, we can do even more optimization. When we assume a particular sub-expression has already been evaluated by the expression above, then we can bypass that sub-expression and directly move to tthe outcome of the sub-expression. This creates new opportunities and when we repeat the cycle for all sub-expressions, at each stage, we could eliminated redundancies. An interesting observation here is that this exercise for optimizing code could also help us detect unreachable code and simplify the graph. Now we take the example above and remove the redundancy of opcode nodes and instead replace the edged to move directly to the outcome of the sub-expressions above. In this case we had the linear decision tree of IP, frag, TCP, common and we remove three sets of those copies and adding edges directly from them to the outcome. We also add edges from src port 100 to dest port 100 and dest port 200 as well as an edge from dest port 100 to src port 200 and completing the branches from the remaining nodes to the outcomes. We only have two outcomes true or false in the leaf level and all the nodes will be connected to it via edges. This covers the optimization in the Berkeley packet filter.


No comments:

Post a Comment