The a-b-t model provides an intuitive way of describing source behavior in an application-neutral manner that is relevant for the performance of TCP. However, this would be of little use without a method for measuring real network traffic and casting TCP connections into the a-b-t model. We have developed an efficient algorithm that can convert an arbitrary trace of TCP/IP protocol headers into a set of connection vectors. The algorithm makes use of the wealth of information that segment headers provide to extract an accurate description of the abstract source-level behavior of the applications driving each TCP connection in the trace. It should be noted that this algorithm is a first solution to a complex inference problem in which we are trying to understand application behavior from the segment headers of a measured TCP connection without examining payloads, and hence without any knowledge of the identity of the application driving the connection. This implies ``reversing'' the effects of TCP and the network mechanisms that determine how ADUs are converted into the observed segments that carry the ADU. The presented algorithm is by no means the only one possible, or the most sophisticated one. However, we believe it is sufficiently accurate for our purpose, and we provide substantial experimental evidence in this and later chapters to support this claim.
The starting point of the algorithm is a trace of TCP segment headers,
, measured on some network link.
Our technique applies to TCP connections for which
both directions are measured (known as a bidirectional trace),
but we will also comment on the problem of extracting a-b-t connection
vectors from a trace with only one measured direction
(a unidirectional trace). While most public traces are bidirectional
(e.g., those in the NLANR repository [nlaa]), unidirectional traces are sometimes
collected when resources (e.g., disk space) are limited.
Furthermore, routing asymmetries often
result in connections that only traverse the measured link in one direction.
We will use Figure 3.10 to describe the basic technique for measuring
ADU sizes and quiet time durations.
The figure shows a set of TCP segments representing the
exchange of data illustrated in the a-b-t diagram of Figure 3.1.
After connection establishment (first three segments), a data segment is
sent from the connection initiator to the connection acceptor. This data
segment carries ADU , and its size is given by the difference between the
end sequence number and the beginning sequence number assigned to the data (bytes 1 to 341).
In response to this data segment, the
other endpoint first sends a pure acknowledgment segment
(with cumulative acknowledgment number 342), followed by two data segments
(with the same acknowledgment numbers).
This change in the directionality of the data transmission makes it
possible to establish a boundary between the first data unit
,
which was transported using a single segment and had a size of 341 bytes,
and the second data unit
, which was transported using two segments
and had a size of 2,555 bytes.
The trace of TCP segments must include a timestamp for
each segment that reports the time at which the segment was observed at the
monitoring device.
Timestamps provide a way of estimating the duration of quiet times
between ADUs.
The duration of
is given by the difference between
the timestamp of the 4th segment (the last and only segment of
),
and the timestamp of the 6th segment (the first segment of
).
The duration of
is given by
the difference between the timestamp of the last data segment of
(7th segment
in the connection) and the timestamp of the first FIN segment (8th segment in the connection).
Note that the location of the monitoring point
between the two endpoints affects the measured duration
of and
(but not the measured sizes of
and
).
Measuring the duration of
from the monitoring point 1 shown in
Figure 3.10 results in an estimated time
that is larger than the estimated time
measured at monitoring point 2.
Inferring application-layer quiet time durations is always complicated
by this kind of measurement variability (among other causes), so short quiet times (with durations
up to a few hundred milliseconds) should not be taken into account. Fortunately,
the larger the quiet time duration, the less significant the measurement
variability becomes, and the more important the effect of the quiet time is on the
lifetime of the TCP connection. We can therefore choose to assign a value of zero to any
measured quiet time whose duration is below some threshold, e.g., 1 second, or simply use the
measurement disregarding the minor impact of its inaccuracy.
If all connections were as ``well-behaved'' as the one illustrated in Figure
3.10, it would be trivial to create an algorithm to extract
connection vectors from segment header traces. This could be done by simply examining the
segments of each connection and counting the bytes sent between data
directionality changes. In practice, segment reordering, loss,
retransmission, duplication, and concurrency make the analysis much more complicated.
Figure 3.11 shows a second set of segment exchanges
that carry the same a-b-t connection vector of Figure
3.1.
The first data segment of the ADU sent from the connection
acceptor, the 6th segment, is lost somewhere in the network,
forcing this endpoint to retransmit this segment some time later as the 9th segment.
Depending on the location of the monitor (before or after the point of loss),
the collected segment header trace may or may not
include the 6th segment.
If this segment is present in the trace (like in the trace collected at
monitoring point 2), the analysis program must detect
that the 9th segment is a retransmission and ignore it. This ensures we compute
the correct size of , i.e., 2,555 bytes rather than 4,015 bytes.
If the lost segment is not present in the trace (like in the trace collected at monitoring
point 1), the analysis must detect the reordering of segments using their
sequence numbers and still output a size for
of 2,555 bytes.
Measuring the duration of
is more difficult in this case,
since the monitor never saw the 6th segment.
The best estimation is the time
between the
segment with sequence number 341 and the segment with sequence number 2555. Note that if the
6th segment is seen (as for a trace collected at monitoring point 2),
the best estimate is the time
between 5th and 6th segments.
A data acquisition algorithm capable of handling these two cases cannot simply
rely on counts and data directionality changes,
but must keep track of the start
of the current ADU, the highest sequence number seen so far, and the timestamp
of the last data segment. In our analysis, rather than trying to handle every possible case of loss
and retransmission, we rely on a basic property of TCP to conveniently reorder segments and still
obtain the same ADU sizes and inter-ADU quiet time durations.
This makes our analysis simpler and more robust.
A fundamental invariant that underlies our previous ADU analyses is that every byte of application data in a TCP connection receives a sequence number, which is unique for its direction3.7. This property also means that data segments transmitted in the same direction can always be logically ordered by sequence number, and this order is independent of both the time at which segments are observed and any reordering present in the trace. The logical order of data segments is a very intuitive notion. If segments 6 and 7 in Figure 3.10 carried an HTML page, segment 6 carried the first 1,460 characters of this page, while segment 7 carried the remaining 1,095. Segment 6 logically preceded segment 7. When the same page is transmitted in Figure 3.11, the first half of the HTML is in segment 6 (which was lost) and again in segment 9. Both segments 6 and 9 (which were identical) logically precede segment 7, which carried the second half of the HTML page.
The notion of logical order of data segments can also be applied to
segments flowing in opposite directions of a sequential TCP connection.
Each new data segment in a sequential connection
must acknowledge the final sequence number of the last in-order ADU received
in the opposite direction. If this is not the case, then the new data is not
sent in response to the previous ADU, and the connection is not sequential
(i.e., two ADUs are being sent simultaneously in opposite directions).
In the previous
examples in Figures 3.10 and 3.11,
we can see that both data segments comprising acknowledge
the final sequence number of
. Intuitively, no data belonging to
can be sent by the server until
is completely received
and processed. The data in
logically precede the
data in
, and therefore the segment carrying
logically
precedes the segments carrying
.
Given the sequence and acknowledgment numbers of two data segments,
flowing in the same or in opposite directions, we can always say whether
the two segments carried the same data, or one of them logically preceded
the other.
Connections that fit into the sequential a-b-t model are said to preserve a total order of data segments with respect to the logical flow of data:
For any pair of data segmentsIn the example in Figure 3.11, the data in segment 9 logically precedes the data in segment 7 (e.g., segment 9 carries the first 1460 bytes of a web page, and segment 7 carries the rest of the bytes). We know this because the sequence numbers of the bytes in segment 9 are below the sequence numbers of the bytes in segment 7. The first monitoring point observes segment 7 before segment 9, so temporal order of these two segments did not match their logical data order. A total order also exists between segments that flow in opposite directions. In the example in Figure 3.11, the data in segment 4 logically precede the data carried in the rest of the data segments in the connection. Timestamps and segment reordering play no role in the total order that exists in any sequential connection.and
, such that
is not a retransmission of
or vice versa, either the data in
logically precedes the data in
, or the data in
logically precedes the data in
.
Logical data order is not present in data-concurrent connections, such as the one shown in Figure 3.8. For example, the segment that carried the last b-type ADU (the 438 don't send ADU) may have been sent roughly at the same time as another segment carrying some of the new data of the data unit sent in the opposite direction (such as a CHECK ADU). Each segment would use new sequence numbers for its new data, and it would acknowledge the data received so far by the endpoint. Since the endpoints have not yet seen the segment sent from the opposite endpoint, the two segments cannot acknowledge each other. Therefore, there is no causality between the segments, and no segment can be said to precede the other. This observation provides a way of detecting data concurrency purely from the analysis of TCP segment headers. The idea is that a TCP connection that violates the total order of data segments described above can be said to be concurrent with certainty. This happens whenever a pair of data segments, sent in opposite directions, do not acknowledge each other, and therefore cannot be ordered according the logical data order.
Formally, a connection is considered to be concurrent when
there exists at least one pair of data segments and
that either flow in opposite directions and satisfy
We consider first the case where and
flow in opposite directions,
assuming without loss of generality that
is sent from initiator to
acceptor and
from acceptor to initiator.
If they are part of a sequential connection, either
is sent after
reaches the initiator, in which case
acknowledges
so
, or
is sent after
reaches the acceptor in which case
. Otherwise, a pair of data segments that do not
acknowledge each other exists, and the connection exhibits data concurrency.
In the case of segments and
flowing in the same direction,
we assume without loss of generality that
.
The only way in which
can be less than
is when
is a retransmission sent after
, and
at least one data segment
with new data sent from the opposite
direction arrives between the sending of
and the sending of
.
The arrival of
increases the cumulative acknowledgment number in
with respect to
, which means that
.
In addition,
cannot acknowledge
, or
would not be retransmitted. This implies that
the connection is not sequential, since the opposite side sent new data in
without waiting for the new data in
.
Thus, only data-concurrent connections have a pair of segments that can simultaneously satisfy inequalities (3.1) and (3.2) or inequalities (3.3) and (3.4). These inequalities provide a formal test of data concurrency, which we will use to distinguish sequential and concurrent connections in our data acquisition algorithm. Data-concurrent connections exhibit a partial order of data segments, since segments flowing in the same direction can always be ordered according to sequence numbers, but not all pairs of segments flowing in opposite directions can be ordered in this manner.
Situations in which all of the segments in a concurrent data exchange are actually sent sequentially are not detected by the previous test. This can happen purely by chance, when applications send very little data or send it so slowly that concurrent data sent in the reverse direction is always acknowledged by each new data segment. Note also that the test detects concurrent exchanges of data and not concurrent exchanges of segments in which a data segment and an acknowledgment segment are sent concurrently. In the latter case, the logical order of data inside the connection is never broken because there is no data concurrency. Similarly, the simultaneous connection termination mechanism in TCP in which two FIN segments are sent concurrently is usually not associated with data concurrency. In the most common case, none of the FIN segments or only one of them carries data, so the data concurrency definition is not applicable. It is however possible to observe a simultaneous connection termination where both FIN segments carry data, which is considered concurrency if these segments satisfy inequalities (3.1) and (3.2).
We have developed an efficient data analysis algorithm that can determine whether a connection is sequential or concurrent, and can measure ADU sizes and quiet time durations in the presence of arbitrary reordering, duplication, and loss. Rather than trying to analyze every possible case of reordering, duplication/retransmission, and loss, we rely on the logical data order property, which does not depend on the original order and timestamps.
Given the set of segment headers of a TCP connection sorted by timestamp, the algorithm performs two passes:
The first step of the algorithm creates a doubly-linked list, ordered_segments in which each list node represents a data segment using the following four fields:
Insertion of a node into the list starts backward from the tail of the ordered_segments looking for an insertion point that would satisfy the first invariant. If the connection is still being considered sequential, the insertion point must also satisfy the second invariant. This implies comparing the sequence numbers of the new segment with those of the last segment in the ordered_segments. The comparison can result in the following cases:
The second step is to walk through the linked list and produce an a-b-t
connection vector. This can be accomplished in time using ordered_segments.
For concurrent connections, the analysis goes through the
list keeping separate data for each direction of the connection. When a long
enough quiet time is found (or the connection is closed),
the algorithm outputs the size of the ADU.
For sequential connections, the analysis looks
for changes in directionality and outputs the amount of data in between
the change as the size of the ADU.
Sufficiently long quiet times also mark ADU boundaries,
indicating an epoch without one of the ADUs.
Reordering makes the computation of quiet times more complex than it seems. As shown in Figure 3.11, if the monitor does not see the first instance of the retransmitted segment, the quiet times should be computed based on the segments with sequence numbers 341 and 2555. This implies adding two more fields to the list nodes:
Our trace processing algorithm makes two assumptions. First, it assumes we can isolate the segments of individual connections. Second, it assumes that no wraparound of sequence numbers occurs (otherwise, logical data order would not be preserved). These two assumptions can be satisfied by preprocessing the trace of segment headers. Isolating the segments of individual TCP connections was accomplished by sorting packet header traces on five keys: source IP address, source port number, destination IP address, destination port number, and timestamp. The first four keys can separate segments from different TCP connections as long as no source port number is reused. When a client establishes more than one connection to the same server (and service), these connections share IP addresses and destination port numbers, but not source port numbers. This is true unless the client is using so many connections that it reuses a previous source port number at some point. Finding such source port number reuses is relatively common in our long traces, which are at least one hour long. Since segment traces are sorted by timestamp, it is possible to look for pure SYN segments and use them to separate TCP connections that reuse source port numbers. However, SYN segments can suffer from retransmissions, just like any other segment, so the processing must keep track of the sequence number of the last SYN segment observed. Depending on the operating system of the connection initiator, this sequence number is either incremented or randomly set for each new connection. In either case, the probability of two connections sharing SYN sequence numbers is practically zero.
Segment sorting according to the previous 5 keys requires
time (we use the Unix sort utility for our work). It is also possible
to process the data without an initial sorting step by keeping state in memory
for each active connection. On the one hand, this can potentially eliminate
the costly
step, making the entire processing run in linear time. On the
other hand, it complicates the implementation, and increases the memory requirements
substantially3.8.
Detecting the existence of distinct connections with identical source and destination IP addresses
and port numbers requires
time, simply by keeping track of SYN sequence numbers
as discussed above. In our implementation,
this detection is done at the same time as segments are inserted
into ordered_segments data structure, saving one pass.
TCP sequence numbers are 32-bit integers, and the initial sequence
number of a TCP connection can take any value between 0 and .
This means that wraparounds are possible, and relatively frequent.
One way to handle sequence number wraparound is by keeping track
of the initial sequence number and performing a modular subtraction. However,
if the SYN segment of a connection is not observed (and therefore the
initial sequence number is unknown), using modular arithmetic will fail whenever the
connection suffers from reordering of the first observed segments. In this case
the subtraction would start in the wrong place, i.e., from the sequence number of the
first segment seen, which is not the lowest sequence number due to the reordering.
One solution is to use backtracking, which complicates the processing of traces.
A related problem is that representing sequence numbers as 32-bit integers
is not sufficient for connections that carry more than bytes of data (4 GB).
The simplest way to address this measurement problem is to encode sequence
numbers using more than 32 bits in the ordered_segments data
structure. In our implementation we use 64 bits to represent sequence numbers,
and rely on the following algorithm3.9to accurately convert 32 bit sequence numbers
to 64-bit integers even in the presence of wraparounds.
The algorithm makes use of a wraparound counter and a pair of flags
for each direction of the connection.
The obvious idea is to increment the counter each time a transition from a high sequence
number to a low sequence number is seen.
However, due to reordering, the counter could be incorrectly incremented more than once.
For example, we could observe four segments with sequence numbers
, and
. Wraparound processing should convert them into
, and
. However, if the wraparound
counter is incremented every time a transition from a high sequence number
to a low sequence number is seen,
the counter would be incremented once for the segment with the sequence number
and again for the segment with sequence number
.
In this case, the wraparound processing would result in four segments
with sequence numbers
, and
.
The second increment of the counter would be incorrect.
The solution is to use a flag that is set
after a ``low'' sequence number is seen, so the counter is incremented only
once after each ``crossing'' of .
This opens up the question of when to unset this flag so that the next true
crossing increments the counter.
This can be solved by keeping track of the crossing of the middle sequence
number. In our implementation, we use two flags, low_seqno
and high_seqno, which are set independently. If the next segment
has a sequence number in the first quarter of
(i.e., in the range
between 0 and
), the flag low_seqno is set to true.
If the next segment has a sequence number in the fourth quarter
of
(i.e., in the range between
and
),
the other flaghigh_seqno is set to true. These flags are unset, and the
counter incremented, when both flags are true and the next segment is not in
the first or the fourth quarter of
.
Sequence numbers in the first quarter are incremented
to
times the counter plus 1.
The rest are incremented by
plus the counter.
This handles the pathological reordering case in which the sequence
number of the first segment in a connection is very close to zero, and the next segment
is very close to
. In this case the low sequence number would be incremented
by
.
This algorithm requires no backtracking, and runs in
time.
In our implementation, the sequence number conversion algorithm
has been integrated into the same pass as the insertion step of the ADU
analysis.
Our data acquisition techniques have been implemented in the analysis program tcp2cvec. The program also handles a number of other difficulties that arise when processing real traces, such as TCP implementations that behave in non-standard ways. In addition, it also implements the analysis of network-level parameters described in the next chapter.
Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos