In this section, we consider the source-level trace replay of the three packet header
traces: Leipzig-II, UNC 1 PM, and Abilene-I.
The first goal is to study how well the replay experiments preserve the source-level input, which is
the collection of connection vectors extracted from the original trace
.
In principle, the characterization of source-level behavior using the a-b-t model represents
characteristics of each connection that are invariant to network conditions, so the
analysis of the generated trace
should result in a collection of connections
vectors
that is identical to
. In practice, there are some practical
limitations that make the two sets of connection vectors different.
We will discuss the possible causes in this section, and present a statistical
comparison of
and
.
The second goal of this section is to study the impact of introducing packet losses in the generated process. For this purpose, we conducted two source-level trace replays of each original trace. The lossless replay reproduced the a-b-t connection vector of each original connection, and gave each connection its measured round-trip time and TCP receiver window sizes. The lossy replay additionally applied its measured loss rate to each replayed connection. Differences between the lossless and lossy replays tell us about the robustness of both our source-level characterization and traffic generation tools in the presence of losses. These losses are completely absent from our experiments unless they are artificially introduced using usernet , as in the lossy replay.
|
The plots in Figure 5.4 compare the distributions of a-type
ADU sizes, , for the original set of connection vectors in Leipzig-II, and
for the sets of connection vectors extracted from its lossless and lossy
replays. In each plot, the three distributions marked with white symbols correspond to sequential connection
vectors, and the ones marked with black symbols to concurrent connection vectors. The left plot shows
the bodies of the distributions, using CDFs in log-linear axes.
The right plot shows the tails of the distributions, using CCDFs
in log-log axes. In general, there is an excellent agreement between the original distributions
and those from the source-level replays.
The bodies of the distributions from sequential connections lie on top of each
other, even if per-connection loss rates are used during the experiments.
As discussed in 3.4, our ADU measurement algorithm can sometimes be
inaccurate when one of the last segments of a TCP window is lost before the monitor.
In this case, the loss is recovered after a timeout, which can create a quiet
time between the consecutive segment
that is long enough to unnecessarily split an ADU. This means that a sample from one of the
a-type data units in
becomes two samples
and
in
, such that
.
The validation of the data acquisition methods in Section 3.4 demonstrated
that ADU splitting due to TCP timeouts is possible, although its impact was small
even when large data units and aggressive loss rates were used.
The comparison of the Leipzig-II lossless and lossy replays, which represent much more realistic traffic,
shows that ADU splitting due to TCP timeouts has very little impact in practice, at least for the
relatively light distribution of loss rates in Leipzig-II.
We can hardly observe any difference between the bodies of the
distributions when losses are added to the replay. The two bodies from the replay
are also very similar to the body of the original distribution.
The same is true for the tails, which do not show any significant difference.
This analysis demonstrates that tmix can accurately reproduce the sizes of
a-type data units in sequential connections, even when ADUs are large and
when experiments are lossy.
There is also a very good match between the distributions for concurrent connection
vectors. In some regions, we notice somewhat thicker lines that come from small offsets
of the curves. The tails of the
distribution for concurrent connections are also very
similar, although the one from the lossy replay is slightly heavier for values below
5 MB, and slightly lighter for values above that. This could be explained by the
inaccuracy discussed above, or by trace boundaries. In the latter case,
losses reduce throughput, making the replay of lossy
connections are slower than the replay of lossless ones. This means that
some a-type ADUs may not have
time to complete their transmission before the end of the experiment.
|
Figure 5.5 compares the distribution of b-type ADU sizes,
, for the connections vectors extracted from the original Leipzig-II
trace and their lossless and lossy source-level replays.
For sequential connection vectors, both the bodies and the tails are identical.
For concurrent connection vectors, the distributions show slightly different bodies,
but identical tails. The differences cannot be explained by the ADU splitting due to TCP timeouts.
If so, we would see a difference between the distributions
from the lossless replay and the ones from the lossy replay, but this is not the case.
The source of the difference is an inherent problem with the replay of concurrent
connections, the misclassification of the replayed concurrent connections.
While tmix always replays a concurrent connection vector in the right way
(i.e., decoupling the two directions), the actual set of segments observed at the
monitor may simply not have any pair of data segments that satisfy the concurrency
test given in Section 3.3.3.
In other words, the segments of a replayed concurrent connection may exhibit a
fortuitous sequential ordering.
As a consequence, the data analysis algorithm classifies as sequential some connections
from the replay that were concurrent in the original trace. The sizes of the b-type ADUs
in these misclassified connections are then absent from the
distribution for replayed
concurrent connections.
The small difference in the plot between the original and replayed distributions
demonstrates that the number of misclassifications is relatively small, so the majority
of the concurrent connections still exhibit concurrent behavior in the replays.
It is important to note that the probability of a misclassification decreases as the
sizes of the ADUs increase, since the larger number of data segments makes
finding a concurrent pair more likely. Therefore, misclassifications become
less significant for the tails of the distributions, since the connections
whose samples are in the tail have necessarily at least one large ADU (the one we see in the tail),
and are less likely to be misclassified.
There is no appreciable difference between the tails of the distributions
from concurrent connections, in agreement with our observation regarding the lower likelihood
of misclassification for connections with large ADUs. Misclassified connections are described using the sequential
a-b-t model, so they result in additional samples for the distributions that
characterize sequential connection vectors. These extra samples have a much smaller
effect on the CDFs, since the number of samples from sequential connections is far
larger anyway.
|
Figure 5.6 considers the distribution of the number of
epochs extracted from the original and from the generated packet header traces.
The distributions from the replays are very similar to the original one.
The small difference comes again from the small number of misclassified
concurrent connections that were considered sequential.
Misclassified connections add extra samples to
which slightly distort the
distributions from the replays.
There is a somewhat bigger difference in the far tail, for connection vectors between
1250 and 1500 epochs. This difference could be explained by
misclassification and by trace boundaries (connections replayed more slowly than
in the original that do not replay all of their epochs). We observe no difference
between lossless and lossy replays in this part of the tail.
|
The next pair of plots, shown in Figure 5.7, examines the
distribution of the quiet times on the acceptor side of TCP connections,
i.e., between
and
.
The plot of the bodies shows a very good match between the original
distribution and the ones measured from the replays of sequential connections.
The slightly heavier distributions from the replays is due to a small simplification
we made regarding the replay of quiet times. Tmix will replay the exact
quiet times specified in each connection vector. However, as discussed in
Section 3.3.1, when these quiet times are extracted from
a packet header trace, the measured quiet time is the sum of two components. The
first component comes from the quiet time
at the end host,
and the second component comes from the delay
between the monitor and
the endpoint.
When tmix replays a quiet time, it remains quiet for the exact duration of the
sum of these components,
. Given that the replay in the testbed uses usernet to
reproduce the measured round-trip time of each connection,
there is also a delay between tmix end hosts and monitor,
so the analysis of the generated packet header trace results in quiet times of the form
. It would have been possible to eliminate this inaccuracy by
subtracting
from the originally measured quiet times. The value of
is
equal to half of the one-side transit time, although delayed acknowledgments and queuing
can affect individual samples. We did not try to incorporate a correction for this
quiet time overestimation problem in our experiments.
Besides measurement difficulties, the extra delay becomes less significant
in larger quiet times, for which
is far smaller than
. Larger quiet times
are far more significant, since they are the ones that
can increase the duration of TCP connections substantially.
There is also a good agreement in the tails of the distributions, although the
distributions from the replays are slightly heavier than the original distributions.
This is not explained by the previous
overestimation of quiet times due the location of the monitor, because the magnitude of
the quiet times in the tail is far larger than the magnitude of
.
The source of this small mismatch is the misclassification of some concurrent connections.
This is true for both the differences between the tails from sequential connection vectors
and between the tails from concurrent connection vectors.
It may seem counter-intuitive that the misclassifications makes both types of tails heavier,
instead of making one type of tail heavier and the other one lighter.
The explanation is that misclassifications move samples from concurrent connections to sequential
connections. These moved samples satisfy at the same time the following two properties:
|
The distribution of quiet times on the initiator side of TCP connections,
i.e., between
and
, is compared for original and replayed traces in
Figure 5.8.
The bodies of the distributions show the
same kind of mismatch that we discussed for the
distributions.
For values below a few seconds the
distribution from the replay of sequential
connections appears heavier that the original distribution.
This is due to the overestimation of quiet times, which becomes less significant as the
quiet time becomes larger. We can also observe that the difference in the shortest
quiet time is larger for
than for
. The reason is not completely clear,
but it is probably related to the absence of samples in
from the large subset of
connection vectors with only one epoch.
The
distribution from the replay of concurrent connections
appears lighter than the original for values above one second. This is due to concurrent
connection misclassification. The much larger number of samples in the distributions
for sequential connections makes the impact of the misclassification very small.
|
Besides the replay of the source-level characteristics of the connections in Leipzig-II,
our experiments also involved replaying the network-level parameters measured for
each connection in .
The left plot in Figure 5.9 compares the distributions
of round-trip times extracted from the original and the generated packet header traces.
The reproduction was very accurate for sequential connection vectors, and the three
distributions exactly lie on top of each other.
On the contrary, the distributions for the replayed concurrent connections show a
strange jump in probability at 100 milliseconds. The reason for this anomaly, which changed
the shape of the rest of the distribution, is unclear.
The right plot of Figure 5.9 compares the distributions of receiver window sizes. Note that the probability was computed over the total number of data bytes transferred, to give a better sense of the amount of data associated with each receiver window size. There is an excellent match between the distribution obtained from the Leipzig-II trace and those from its two source-level replays.
|
The final comparison examines the distributions of loss rates. The left plot of Figure 5.10 shows the distribution of the measured loss rates for the original Leipzig-II trace and its replays. There is a reasonable match between the original and the lossy replays, especially for sequential connections. This is a good result given that usernet creates losses by generating random numbers in an independent manner. The small difference is probably explained by a sample size problem in short connections with non-zero loss rates, as discussed in Section 4.1.3, and by concurrent connection misclassification.
Note that we measured some non-zero loss rates in the lossless experiment, in which no artificial losses were introduced. This suggests some problem with the experimental environment, perhaps some network interfaces that were duplicating segments. Such duplicates confuse the loss rate measurement algorithm, which considers each retransmission a loss event5.3. If duplication is behind our observations, the impact on the experiments would be minimal. True loss slows down TCP, but duplication does not.
The right plot of Figure 5.10 shows the distributions of loss rates per byte, rather than per connection as in the left plot. The CDFs show the probability that each byte had of being carried in a connection with at most the given loss rate. For example, the CDFs for the original sequential connections shows that 80% of the bytes were carried in connections with a loss rate of 1% or less. The CDFs in the right plots are easier to read than those in the left plot, since they are far smoother. They are also more significant, since they pay more attention to the connections that carry more bytes, which are those than have a larger impact on the load of the network. There is a good match between loss rate distributions for the original and the lossy replay. Both the distribution from the replayed sequential connections and the one from replayed concurrent connections are slightly heavier than those from the original traces.
In general, we always observe heavier loss rates in the replays than in the original data. The explanation is the dropping of pure acknowledgment packets, which was discussed in Section 4.1.3. The analysis of the original trace considers only the loss rate of data segments, and not the combined loss rate of data and acknowledgment segments. However, the artificial dropping mechanisms in usernet that is used to create per-flow losses is applied to all of the packets in the connections. This means that both data segments and acknowledgment segments are dropped according to the original loss rates of data segments. The dropping of acknowledgment segments can increase the loss rate of data segments in the replay, because missing acknowledgments can trigger unnecessary retransmissions. Every retransmission is considered a loss event, and therefore we have an increase of loss rate in the replays, which makes the measured distributions of (data segment) loss rates heavier for the replays than for the original. It is certainly possible to modify usernet to apply the dropping rate to data segments only, but our experiments did not incorporate this refinement. It is somewhat unrealistic to use a biased dropping mechanism, so it would be better to refine the data acquisition algorithm to consider both data and pure acknowledgment losses. Measuring pure acknowledgment loss rates is far more difficult that measuring data segment loss rates. Endpoints may acknowledge every data segment, or every other data segment, and they do so using cumulative acknowledgment numbers, rather than individual sequence numbers as it is done for data segments. It is therefore more difficult to determine when an acknowledgment does not arrive as expected.
The second trace considered in our validation of the source-level trace replay approach
is the UNC 1 PM trace.
This trace is shorter than Leipzig-II (1 hour vs. 2 hours and 45 minutes)
but it has much higher throughput, which results in a substantially larger number of
samples in the distributions that we will examine in this section.
Figure 5.11 compares the distributions extracted
from the UNC 1 PM and its lossless and lossy replays.
The bodies of the
distributions from sequential connections reveal no difference
between original and generated traces. The tail of the
distribution from the
lossy replay is slightly lighter than the one from the original trace and the one
from the lossless replay. This difference can be attributed to trace boundaries.
Losses make the replay of some connections slower, which can easily result in some
connections that do not have time to finish during the replay experiment. This effect is
more important for the largest data units, those in the tail of the distribution,
since they are the ones that require a substantial amount of time to complete their
transmission even without losses.
Concurrent connections show a slightly worse match. This is due to the misclassification
problem described in the previous section. As pointed out before, misclassifications are
more likely to occur in concurrent connections with small ADUs. These connections have
a small number of packets, making the observation of concurrent pairs less
likely. As a result, the bodies of the distributions from the replays are slightly heavier,
since some fraction of the small ADUs disappeared from the distribution for
concurrent connections.
On the contrary, misclassifications had no visible impact on the
distribution
for sequential connections. This is because the number of a-type ADUs in
sequential connection vectors is much larger than the number of samples from misclassified
connections. The tails of the
distributions for concurrent connections show a good agreement.
The distributions from the original UNC 1 PM traces and its replays are
even closer, as Figure 5.12 shows.
We can barely see any differences in bodies of the distributions from concurrent
connections and no difference for those from sequential connections.
The tails are also very similar, and the slight differences can be explained
using the same arguments put forward in the discussion of the
distributions
(i.e., trace boundaries and misclassifications).
Figure 5.13 shows an excellent match between the number of epochs in sequential connection vectors measured from the UNC 1 PM traces, and those measured from the replays. The bodies of the distributions are identical, and the tails show only a very minor difference. We therefore observe a better agreement between original and replay for UNC 1 PM than for Leipzig-II (see Figure 5.6).
The plots in Figure 5.14 study the distributions.
The bodies for sequential connections show an
excellent match between the inter-ADU quiet time measured from the original
UNC 1 PM trace, and those measured from the generated traces.
The bodies for concurrent connections are also very similar.
The small difference for the smallest values requires further investigation.
We should not see these samples here because our only method for detecting
inter-ADU quiet times in concurrent connections is to identify periods of
inactivity above 500 milliseconds. We do not observe such a difference for Leipzig-II and
Abilene-I.
The tails of the distributions are very similar for sequential and concurrent
connections. As it was also the case in the data from Leipzig-II shown in Figure
5.7, we observe slightly heavier tails from the replays, which can
be explained by misclassifications.
Figure 5.15 shows the bodies and the tails of the
distributions.
Data from sequential connections shows an excellent match for
values above 1 second, and even the far tail is very closely approximated.
For values below 1 second, we observe that the replays have heavier
distributions. This is explained by the quiet time overestimation problem
discussed in the analysis of the Leipzig-II results.
Concurrent connections also show an excellent match between original and
generated traces. The artifact in the smallest inter-ADU quiet times that
was observed for the
distributions from concurrent connections
is also present in the
distributions from concurrent connections.
|
The next four plots study how closely the replays of UNC 1 PM approximated the network-level parameters observed in the original plot. The left plot of Figure 5.16 shows the distributions of round-trip times. For sequential connections, there was no difference between the round-trip times obtained from the original trace and those obtained from its replays. For concurrent connections, there is only a very small difference, which we can attribute to concurrent connection misclassifications. The large masses of probability for 100 milliseconds observed in the Leipzig-II replays are not present in the UNC 1 PM replays.
Regarding the distribution of TCP receiver window sizes, the plot on the right in Figure 5.16 shows a good match between the original data and the one obtained from the analysis of the generated packet header traces. The tiny difference can again be explained by concurrent connection misclassifications, but it is clear that the replayed traffic accurately captured the use of TCP receiver window sizes.
|
Figure 5.17 studies the distributions of loss rates rates obtained from original and replayed traffic. As indicated in the analysis of the replays of Leipzig-II, matching loss rate is difficult given the use of independent packet dropping in usernet . Consequently, we can consider the approximation of the loss rates shown in the left plot of the figure reasonable, especially in the case of sequential connections, for which many more samples were available. In contrast to these per-connection loss rates, the right plot of the figure shows a substantially closer approximation when loss rate per bytes are considered. Note also that difference between distributions of loss rates for sequential and concurrent connections is far smaller in the case of probabilities per byte.
|
|
|
We conclude the validation of our source-level trace replay method by comparing the original
Abilene-I trace and its lossless and lossy replays. This is the trace with the highest average throughput.
Figure 5.18 shows that the distributions measured from the replayed
traces are very similar to those measured from the original trace.
Given the completely different
distributions for sequential and concurrent connections, we would expect that any
substantial number of misclassified concurrent connections would result in distributions
from the replays that significantly diverge from the original distributions.
The excellent approximation in this figure, and for the
distributions
shown in Figure 5.18, suggest that the number of misclassifications was
very small.
We also observe a very good match for the tails where the only
difference is found for the largest values. In some cases, the replay is slower than the
original trace, so some of the largest ADUs may not have had enough time to complete.
Adding losses to the replay experiment did not introduce any noticeable difference
in the measured distributions, which confirms the robustness of the data acquisition
and generation methods to the challenge of lossy environments.
The two plots in Figure 5.19 show that the original distribution of b-type ADU sizes is almost identical to the ones obtained from the lossless and lossy replays. This is true both for the bodies studied in the plot on the left, and for the tails studied in the plot on the right. It is quite difficult to find any region where the distributions differ. It is also clear that the addition of losses to the replay did not modify the sizes of the ADUs in the experiment.
Figure 5.20 shows that the bodies and the tails of the distributions of the numbers of epochs are closely approximated in the source-level replays. There is only a very slight difference in the far tail of the distributions. This could be attributed to a few connections that were replayed more slowly than in the original trace, so they did not have time to complete all of their epochs. Another possibility is that a small number of concurrent connections with a large number of epochs were misclassified. The probabilities in the tail are so small, that even a few samples can create a visible difference.
|
The quality of the replay of quiet times between ADUs is studied in the next two figures.
Figure 5.21 shows that the distributions are accurately
approximated in the replays. This is true both for sequential and concurrent connections.
We only observed a small difference in the far tail, where the
replays show slightly heavier values for quiet times above 1000 seconds.
As in the case of the
distributions, both experiment boundaries and concurrent
connection misclassification can explain the difference.
|
Figure 5.22 examines the distribution of quiet times
on the initiator side. As shown on the left, there is an excellent match between the
bodies of the distributions from the original trace and those from the replays.
The only difference is found in the distributions from sequential connections
for quiet times below 1 second. The quiet times measured from the replays became
increasingly heavier than those from the original trace as their magnitude decreased.
This finding is consistent with inaccuracies due to the overestimation of
quiet times, since end-host location has a larger impact on the measured quiet time
as the magnitude of the application-level quiet time decreases.
The tails of the distributions reveal an excellent
approximation. It is also important to note that the distributions for concurrent
connections do not show the unexpected values below 500 milliseconds that were observed
for UNC 1 PM.
|
The analysis of the round-trip times in Figure 5.23 reveals an excellent match between the original and the replay distributions of round-trip times. The replay of concurrent connections exhibits the same artifact at 100 milliseconds encountered in the replays of the Leipzig-II trace, but the magnitude is far smaller. The distributions of receiver window sizes show very close approximations, with only a small divergence for concurrent connections, which can be easily explained by a small number of misclassifications.
|
The distribution of loss rates in the lossy replay is very close to the original distribution, as shown in Figure 5.24. The CDFs on the left plot show cumulative probabilities computed per connection, and they reveal a remarkably good match between the original and the lossy replay, both for concurrent and sequential connections. This is significantly better than in the cases of Leipzig-II and UNC 1 PM, which were studied in Figures 5.10 and 5.17. The better match is mostly explained by two characteristics of the original data. First, Abilene-I has the largest fraction of lossy connections, which more than doubles the one in Leipzig-II. This means a wider y-axis that reduces the distance between the distributions in the plot. Second, the heavier distribution of connection sizes in the Abilene-I trace means a larger number of packets, which makes the use of independent drops approximate the intended loss rates more accurately. The right plot shows a good match when the distributions of the per-byte loss rates are considered.
Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos