Next: Trace Resampling and Load Up: Introduction Previous: Abstract Source-Level Modeling Contents

Source-Level Trace Replay

**Figure 1.4:** Overview of Source-level Trace Replay.
$\includegraphics[width=4.5in]{fig/pkt-cvec-tmix2.eps}$

Our abstract source-level modeling of TCP connection provides a solid foundation for generation traffic mixes in simulators and network testbeds. We propose to generate traffic using source-level trace replay, as illustrated in Figure 1.4. Given a packet header trace $\mathcal{T}_h$ collected from some Internet link, we first use our data acquisition algorithm to analyze the trace and describe its content as a collection of connection vectors $\mathcal{T}_c=\{(T_i, C_i)\}$ , where $T_i$ is the relative start time of the $i$ -th TCP connection, and $C_i$ is the sequential or concurrent connection vector corresponding to this connection. The basic approach for generating traffic according to $\mathcal{T}_c$ is to replay every connection vector $C_i$ . Each connection vector $C_i$ is replayed by starting a TCP connection precisely at $C_i$ 's relative start time $T_i$ , and transmitting the measured sequence of ADUs ( $a_j$ and $b_j$ ) separated in time by the inter-ADU measured quiet times ( $ta_i$ and $tb_i$ ). In this dissertation, we evaluate a specific implementation of this approach for FreeBSD network testbeds, where traffic is generated using a tool we developed called tmix .

The goal of the direct source-level trace replay of $\mathcal{T}_c$ is to reproduce the source-level characteristics of the traffic in the original link, generating the traffic in a closed-loop fashion. Closed-loop traffic generation implies the need to simulate the behavior of applications, using regular network stacks to actually translate source-level behavior into network traffic. In particular, our experiments use an implementation which relies on the standard socket interface to reproduce the data exchanges in each connection vector. Generating traffic in this manner is closed-loop in the sense that it preserves the feedback mechanism in TCP, which adapts its behavior to changes in network conditions, such as loss and receiver saturation. In contrast, packet-level trace replay, the direct reproduction of $\mathcal{T}_h$ , is an open-loop traffic generation method in the sense that TCP control algorithms are not used during the generation, and hence the traffic does not adapt to network conditions.

The evaluation of our methodology consists of comparing the original trace $\mathcal{T}_h$ and the synthetic trace $\mathcal{T}_h^\prime$ obtained from the source-level trace replay. Validating our traffic generation method consists of transforming $\mathcal{T}_h^\prime$ into a set of connection vectors $\mathcal{T}_c^\prime$ , using the same method used to transform $\mathcal{T}_h$ into $\mathcal{T}_c$ . We then compare the resulting set of connection vectors $\mathcal{T}_c^\prime$ with the original $\mathcal{T}_c$ . In principle, they should be identical, since $\mathcal{T}_c$ represents the invariant source-level characteristics of $\mathcal{T}_h$ . There are however some differences that are explained by the nature of the model and our measurement methods.

The direct comparison of $\mathcal{T}_h$ and $\mathcal{T}_h^\prime$ also provides a way to study the accuracy of our approach in terms of how well traffic is described by the a-b-t model. This is however a subtle exercise. The actual replay of $\mathcal{T}_c$ , which creates $\mathcal{T}_h^\prime$ , necessarily requires the selection of a a set of network-level parameters, such as round-trip times and TCP receiver window sizes, for each TCP connection in the source-level trace replay. The exact set of generated TCP segments and their arrival times is a direct function of these parameters. As a consequence, if we conduct a source-level trace replay using arbitrary network-level parameters, we obtain a $\mathcal{T}_h^\prime$ with little resemblance to the original $\mathcal{T}_h$ . The replayed a-b-t connection vectors may be a perfect description of the source behavior driving the original connections, but the generated packet-level trace $\mathcal{T}_h^\prime$ would still be very different from the original $\mathcal{T}_h$ . To address this difficulty, our replay incorporates network-level parameters individually derived from each connection in $\mathcal{T}_h$ . We have also incorporated methods for measuring three important network-level parameters (round-trip time, TCP receiver window size and loss rate) into our analysis and generation procedure. While this set of parameters is by no means complete, it does include the main parameters that affect the average throughput of a TCP connection found in a trace. This enables us to generate traffic in a closed-loop manner that approximates measured traces very closely.

Incorporating network-level properties is important, but it is critical to understand the main shortcoming of this approach. The goal of our work is not to make the generated traffic $\mathcal{T}_h^\prime$ identical to the original traffic $\mathcal{T}_h$ , which could be accomplished with a simple packet-level replay. As mentioned before, packet-level replays generate traffic that does not adapt to changes in network conditions, resulting in open-loop traffic. Our goal is to develop a closed-loop traffic generation method based on a detailed characterization of source behavior. Traffic generated in a closed-loop manner can adapt to different network conditions, which are intrinsic when evaluating different network mechanisms. Our comparison of $\mathcal{T}_h$ and $\mathcal{T}_h^\prime$ is only a means to understand the quality of traffic generation method, where quality is considered to be higher as the original trace is more closely approximated. If enough parameters of the original traffic are accurately measured and incorporated into the traffic generation experiment, we expect to observe a great similarity between $\mathcal{T}_h$ and $\mathcal{T}_h^\prime$ . On the contrary, if we are missing some important parameters, we expect to observe substantial differences between traces.

By construction, traffic generated using source-level trace replay can never be identical to the original traffic. The statistical properties of original packet header traces are the result of multiplexing a large number of connections onto a single link, and these connections traverse a large number of different paths with a variety of network conditions. It is simply not possible to fully characterize this environment and reproduce it in a laboratory testbed or in a simulation. This is both because of the limitations of passive inference from packet headers, and because of the stochastic nature of network traffic. Source-level trace replay can never incorporate every factor that shaped $\mathcal{T}_h$ , and therefore differences between $\mathcal{T}_h$ and $\mathcal{T}_h^\prime$ are unavoidable. Still, finding a close match between an original trace and its replay, even if they are not identical, constitutes strong evidence of the accuracy of the a-b-t model and the data acquisition and generation methods we have developed. It also demonstrates the feasibility of generating realistic network traffic in a closed-loop manner that resembles a rich traffic mix.

Next: Trace Resampling and Load Up: Introduction Previous: Abstract Source-Level Modeling Contents

Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos