As long as the network setup of a simulation or testbed experiment remains unchanged,
the source-level trace replay of a connection vector trace
always
results in traffic that is similar to the original trace.
Every replay contains the same number of TCP connections behaving
according to the same connection vector specification and starting at the same times.
Only tiny variations are introduced on the end-systems by changes in clock
synchronization, operating system scheduling and interrupt handling,
and at switches and routers by the stochastic nature of packet multiplexing.
Source-level trace replay has therefore two desirable properties:
In this dissertation, we propose two flexible methods for introducing variability
in traffic generation experiments. In both cases, the set of connection vectors in
is randomly resampled, resulting in a new set
that preserves the
aggregate source-level characteristics of the original traffic.
In our first method, Poisson Resampling, we construct a new connection vector
trace
by randomly resampling connections from
, and assigning them
exponentially distributed inter-arrival times. As a result, connections in
arrive
according to a Poisson process. In the second method, Block Resampling,
we resample blocks (groups) of connections rather than individual connections.
This method results in a more realistic connection arrival process, which matches the
substantial burstiness observed in real traces. In more technical terms,
Block Resampling preserves the moderate long-range dependence found in real
connection arrival processes, while Poisson Resampling results in a short-range dependent
connection arrivals process.
This difference is demonstrated in our experimental evaluation of the two methods.
In addition, the evaluation shows that the duration of the resampling block creates
a trade-off between shorter blocks (which increase the number of distinct
resamplings) and long-range dependence (which disappears for short blocks).
Our analysis demonstrates that block durations between 1 and 5 minutes offer
the best compromise.
Researchers often need to conduct a set of experiments with a range of different
traffic loads. When using a traditional source-level model, e.g., a model of web traffic,
researchers have to first conduct a preliminary experimental study to determine how
the parameters of the model, e.g., the number of user equivalents, affect the generated
load [CJOS00,LAJS03,KcLH$^+$02].
This is usually known as the calibration of traffic generator.
Our resampling methods eliminate this common need for calibrating traffic generators,
since the resampling process can be controlled to match a specific target load
(i.e., generated load is known a priori).
In the case of Poisson Resampling, this is accomplished by changing the mean arrival
rate of connections. In the case of Block Resampling, offered load is manipulated using
block thinning (i.e., subsampling) and block thickening (i.e., combining blocks).
Our work reveals that load scaling cannot be based simply on controlling the number of
connections. Such an approach frequently results in offered loads that are far from the
target, because the number of connections in a resample is not strongly correlated
with the offered load represented by these connections.
We address this difficulty by developing byte-driven versions of Poisson
Resampling and Block Resampling, which scale load using a running count of the
total data in the resampled trace
. Unlike the number of connections,
the total amount of data in
is strongly correlated to traffic load
offered by
. Our experiments confirm that byte-driven resampling is
highly accurate, eliminating the common need for calibrating traffic generators.
Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos