The first technique we consider for introducing variability in the traffic generation
process is Poisson Resampling. The starting point of every method presented in this
chapter is a connection vector trace
where
is an augmented connection vector
(an a-b-t connection vector plus some network-level parameters),
and
is its relative start time.
The basic version of our Poisson Resampling technique
consists of deriving a new trace
by randomly choosing connection vectors from
without replacement,
and assigning them start times according to an exponential distribution.
We define the duration
of
as
, the length of the interval in which connections are
started7.1.
Given a target
duration
for
, the Poisson Resampling algorithm
iteratively adds a new
to
until
.
Each
is equal to some randomly selected
, and
![]() |
![]() |
![]() |
![]() |
The resampling technique described above has the advantage of its simplicity. Furthermore, it is statistically appealing, since the exponential distribution naturally arises from the combination of independent events. The use of connection inter-arrivals sampled independently from an exponential distribution is intuitively consistent with the view of traffic as a superposition of a large number of independent connections that transmit data through the same network link. Our empirical data, presented in Figures 7.1 to 7.4, confirm the applicability of this inter-arrival model. Figure 7.1 shows two pairs of CDFs comparing real connection arrivals and their exponential fits. The first pair (white symbols) corresponds to the distribution of connection inter-arrivals for UNC 1 PM (squares), and an exponential distribution7.2 with the same mean (triangles). The second pair (black symbols) shows the distribution of connection inter-arrivals for UNC 1 AM and an exponential distribution with the same mean. Both fits are excellent, so exponentially distributed connection inter-arrivals are clearly a good starting point for a trace resampling technique. The tails of the empirical distributions, shown in Figure 7.2, are also consistent with the fitted exponentials. Their slope is slightly lower, which could motivate a fit with a more general distribution like Weibull. However, a small improvement in the fit would require an increase in the complexity of the model, since the one-parameter exponential model would have to be replaced by the two-parameter Weibull model. This additional effort would produce only a limited gain given that the exponential fit is excellent for 99.9% of the distribution.
Figures 7.3 and 7.4 consider another two traces, Abilene-I and Leipzig-II. The bodies are again very closely approximated, but the tails are heavier for the original data. Note that this effect is more pronounced as the traces get longer. The duration of the UNC traces is one hour, the duration of Abilene-I is 2 hours, and the duration of Leipzig-II is 2 hours and 45 minutes. This could suggest that the worse fit is due to non-stationarity in the connection arrival process, which becomes more likely for longer traces. Further analysis is needed to confirm this hypothesis or find an alternative explanation. We must note that these results are in sharp contrast with those in Feldmann [Fel00], where the empirical inter-arrival distributions were significantly different from the bodies7.3 of fitted exponential distributions. The reason for this difference is unclear at this point7.4.
![]() |
![]() |
The main problem with the basic Poisson Resampling technique is the lack of control over
the load offered by the replay of
.
As we will demonstrate, the number of connections in
is only loosely correlated
to the offered load. As a consequence, it becomes difficult to predict the load
that a Poisson resampled trace generates, even for resamplings of the same duration
and mean inter-arrival rate. This would force researchers to create many
resampled traces until they hit upon a resampling with the intended offered load.
We studied the wide range of offered loads that result from basic Poisson Resampling
by performing a large of number of resamplings of the same connection
vector trace
.
As discussed in the introduction of this chapter, average load created
by
is equal to the total number of bytes
in
divided by its duration
.
Given that TCP headers and retransmitted segments usually represent only a small fraction
of
, the total size of the ADUs in
divided by
is also
a close approximation of
. We use this approximation to examine the average
loads offered by a large number of Poisson resamplings, considering the offered load
of a resampling
equal to the total size
of its ADUs
divided by its duration
.
It is also important to note that the traces we consider are
bidirectional, and they do not necessarily create the same average load in both
directions. The analysis in the rest of this section will focus only on one direction
of the trace, the target direction, whose average load is given the total size of
the ADUs flowing in that direction divided by the duration of the trace.
More formally, the total size of the ADUs in the target direction is
equal to
Figure 7.5 shows a scatterplot of the results of
1,000 resamplings of UNC 1 PM. The duration of the resamplings and their mean rate of connection
inter-arrival were equal to the ones in UNC 1 PM. For each resampling, the total number of
connections is shown on the x-axis, while the resulting offered load
is shown on the y-axis. This plot demonstrates that basic Poisson
Resampling results in traces with very small variability in the number of connection
vectors, between 1,409,727 and 1,417,664
(the standard deviation
was equal to 1,191.71).
On the contrary, the range of offered loads is very wide, between
143.55 and 183.44 Mbps (
Mbps), centered around the offered load
of
, 161.89 Mbps.
The distribution of offered loads and its spread is further illustrated by
the histogram in Figure 7.6.
![]() |
![]() |
The wide range of offered loads that can result from Poisson Resampling is due to
the heavy-tailed nature of the distribution of the total number of bytes contributed
by each connection vector.
The tail of this distribution for the UNC 1 PM trace is shown
in Figure 7.7.
The values in the plot correspond to the target direction, i.e.,
for each connection in
and
for each connection in
.
The tails show non-negligible probabilities for very large sizes, and a linear decay of
the probability over six orders of magnitude in the log-log CCDF.
As a consequence, the contribution to the offered load made by each connection is highly
variable and thus the number of connections in a trace is a poor predictor of its offered
load.
This makes basic Poisson Resampling inadequate for controlling load.
Its only parameter is the mean inter-arrival rate of connections. This
rate controls the same number of connections in the resampling, but not the total size of these connections,
which varies greatly due to the heavy-tailed connection sizes.
Figure 7.8 further illustrates this point using six sets of 1,000
trace resamplings, each set with a different target offered load. The plot shows
a cross marking the mean of the load achieved by each of the sets of 1,000 experiments.
The variability in the offered loads is illustrated using error bars for each set of
experiments. The lower and upper endpoints of the error bars correspond to
the 5th and 95th percentiles of the loads offered by each set of trace resamplings.
Each set of trace resamplings had a fixed mean inter-arrival rate
. Under the
assumption that the mean offered load
is proportional to
the number of connections
, we would expect the load to be inversely
proportional to the mean arrival rate
, since
.
Therefore, if the resamplings have the same duration
, we would
expect to achieve an offered load of
![]() |
![]() |
As the previous section demonstrated, the number of connection vectors in a trace
is a poor predictor of the mean offered load achieved during the replay of a resampled
trace
. Therefore, controlling the number of connections in a resampling does
not provide a good way of achieving a target offered load, and an alternative method is needed.
In the idealized model of offered load in the previous section, the offered load
was said to be exactly equal to
. If so, we need to control
the total number of bytes in the resampled trace
to precisely match a target offered load.
In Byte-driven Poisson Resampling,
the mean inter-arrival rate of
is not computed
a priori using Equation 7.2. Instead, the target load
is
used to compute a target size
for the payloads
in the scaled direction.
Byte-driven Poisson Resampling has two steps:
|
Figure 7.11 summarizes the results of 4 sets of 1,000 byte-drive Poisson resamplings. The plot uses the same type of visualization found in Figure 7.8. The error bars, barely visible in this case, illustrate the accurate load scaling that can be achieved with Byte-driven Poisson Resampling.
![]() |
![]() |
The previous analysis demonstrated that accurate load scaling requires
control of the total number of bytes in
rather than of the total
number of connections. This demonstration was based on the computation of the offered
load using equation 7.1. It is important to verify that the
actual load generated during a testbed replay of
is similar to
the computed load. To show that this is indeed the case, we replayed a number of resampled
traces with four different target loads. Each resampled trace was then constructed using
byte-driven Poisson Resampling, with a duration of 1 hour. To eliminate startup and
shutdown effects, we only considered the middle 40 minutes for computing the achieved load.
Figure 7.12 summarizes the
results of the experiments for the resamplings of the UNC 1 PM trace.
Each point corresponds to a separate replay
experiment, showing the target load on the x-axis and the achieved load on the
y-axis. We ran three experiments for each target load, and the results show
a good approximation of the intended scaling line. Several experiments achieved
loads a few Megabytes above the target. In general, we expected the experiments
to have slightly higher achieved loads, since the scaling method focuses on the offered
payload, ignoring packetization overhead (i.e., extra load from bytes in the packet headers).
A more precise tuning of the offered load would take packetization into
account, perhaps using a fixed constant to decrease the target payload, or by
studying the total size (including headers) of each connection, as replayed in the same
testbed environment.
In any case, and given the above good results, it seems reasonable to ignore these
further refinements.
The results in Figure 7.12 provide examples
of scaling down the load of a trace, since the original load of UNC 1 PM was
177.36 Mbps, and 9 of the 12 experiments had target offered loads below this value.
Scaling down the load of a trace using byte-driven resampling simply requires to choose
a target load below the original load
, which in turns means
that the
of the resampling will be below the original
.
These results confirm the close approximation of the target loads in the testbed
experiments, where offered load is measured from real TCP segments (rather than
computed using Equation 7.1). The plot shows for example that the three
resamplings with target load 177.36 Mbps achieved loads of 176.72, 178.23 and
182.45 respectively. The impact of the TCP headers, retransmission and the slightly
underestimated duration mentioned in the previous section is therefore very small.
The results in Figure 7.13 provide an example of
scaling up the load of a trace, since they correspond to byte-driven Poisson
resamplings of UNC 1 AM, which had an original load of 91.65 Mbps. The resulting
loads also approximate the intended targets very closely.
Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos