Home > Miscellaneous > COMP 258 Project

COMP 258 Project

An efficient algorithm for calculating red and blue line segment intersections

Proposal Slides
Presentation Slides

Final Report

Motivation

Many GIS datasets consist of many layers of data. For example, a logging company might maintain a database which has one layer for where existing stands of trees are, another layer which indicates environmentally sensitive areas, a third layer with projected logging schedules, a fourth with lake and stream buffers, and so forth. These datasets can get exceedingly large, and thus algorithms for manipulating the data must be efficient. In addition, users may wish to perform a series of operations on the data, necessitating algorithms that do not change the properties of the data, such as floating point precision.

One class of operations that are frequently performed on layers of polygons is boolean operations. The user may wish to union or intersect layers of polygons or subtract one polygon layer from another. To perform these operations, we must know which line segments intersect. There are optimal algorithms which calculate line segment intersections and run in O(nlogn+k) time.


Figure 1: Taking the intersection of the red and blue polygon layers.

Unfortunately, calculating intersections increases the precision necessary to avoid problems. Calculating whether or not two segments intersects is degree 2, and calculating the x-order of intersections is degree 5. When the precision necessary to correctly calculate segment intersections increases beyond the capacity of the floating point numbers used as storage, values are rounded in some manner, introducing possible errors. It is possible to use exact evaluation and special data structures which can store large numbers, but the blow-up of data is also problematical.

In calculating intersections for the purpose of boolean operations, we are able to focus on a special case of the segment intersection problem. We assume the input is given as two sets of curves, one red an the other blue, and that there are no red-red or blue-blue crossings.

Chan's algorithm for calculating red-blue segment intersection, as modified in [1], has an optimal O(nlogn+k) running time using at most degree 2 operations. In this algorithm, line segments are treated as flexible pseudo-segments, and have at most one crossing per pair of segments.

Chan's algorithm operates by first computing a trapezoidation of the blue segments and the red endpoints. (This is a decomposition of the plane constructed by extending vertical lines from every red and blue endpoint to the first blue segment.) A sweep then passes over the trapezoids, examining the direction from which red segments enter each trapezoid.

Using degree 2 calculations, it is impossible to determine the precise location at which red segments enter a trapezoid. Instead, the intersections are pushed to the right where there are witnesses to the intersection. A witness to an intersection is the leftmost endpoint that certifies that the order of line segments from top to bottom has changed from the initial order. Although we don't get the exact intersection location, it is possible to determine the order of intersections along the sweep line.


Figure 2: Blue line segment is a witness to the intersection between the red line segment and the dashed blue pseudo-segment. The green line represents the sweep line.

Goal

The goal of my project is to develop a simplified algorithm for calculating red-blue segment intersection as outlined by Jack Snoeyink [5], which is similar to Chan's algorithm. The implementation should be exact and robust, and handle all of the degeneracies, while retaining the O(nlogn+k) running time and maximum degree 2 calculations.


Figure 3: Types of degeneracies.

Although the main ideas for the algorithm have already been worked out by Jack and myself, there are still issues that need to be resolved in order to maintain the O(nlogn+k) running time.

Novel Aspects

The algorithm I will be writing is a new. Although it is similar to Chan's algorithm, it has some fairly significant differences. For example, it does not require that the plane be decomposed by a trapezoidation of blue segments. Like Chan's algorithm, it uses degree 2 computations, and runs in O(nlogn+k) time.

Handling Degeneracies

Palazzi and Snoeyink [4] outline the following method for handling degeneracies:

If a blue end point b1 lies on a red segment r1r2, then use the second endpoint, b2, to decide whether b1 is above or below r1r2. If b2 is not on the line r1r2, then make b1 on the opposite side of r1r2. Thus, if b2 is above r1r2, treat b1 as below r1r2, and vice versa.

If b2 is also on the line through r1r2, then the two segments are colinear. If the two segments intersect in only one endpoint, then arbitrarily treat one endpoint as above r1r2, and the other as below r1r2. If there are an infinite number of intersection points, then this is reported to the user.

Palazzi and Snoeyink [4] also discuss a few other degeneracies in their paper which will not apply to the algorithm I am working on. In addition, my algorithm treats red and blue segments in the same way, which means that the previous descriptions also apply when a red endpoint lies on a blue segment.

The ultimate purpose of computing the line segment intersections is to calculate the arrangement. Because of this, we are allowed to modify the data somewhat to reduce the number of degenerate cases as long as the computations are exact. In the algorithm, we will allow a segment to be broken when an endpoint falls exactly on it. Because the point where the segment is broken is an endpoint of an existing segment, we do not increase the precision necessary to store the endpoints. This reduces the number of degenerate cases from three to two. The degeneracies we must now handle occur when:

Reducing the number of types of degeneracies allows us to reduce the amount of special case code necessary for the algorithm.

Algorithm

The algorithm passes a sweep line from left to right, maintaining the invariance that all intersections with witnesses to the left of the sweep line have been reported. A witness is an endpoint of a line segment. Thus, every time the sweep line encounters an endpoint, it must check to see if an intersection is witnessed and update the data structures.

Data Structures

The algorithm requires data structures that keep track of upcoming start points, upcoming end points, as well as line segments currently intersecting the sweep line.

We can keep track of the upcoming start points using a sorted array or linked list. Running time for the sort is O(nlgn), and is part of the preprocessing: if a set of line segments is being used multiple times, it makes sense to store them in sorted order.

Using a sorted array for the upcoming start points does have one drawback: any broken segments must be kept track of separately, so as not to have a negative effect on the running time.

Upcoming endpoints can be stored in a heap or priority queue, which can be maintained in O(nlgn) time overall.

When the sweep line encounters a start point, the algorithm adds the corresponding segment to the list of currently swept segments, and adds the endpoint of the segment to the queue of endpoints. Any intersections that were witnessed by the start point must be recorded.

When the sweep line encounters an end point, the algorithm checks to see if any intersections were witnessed, then removes the segment from the list of currently swept segments and the queue of endpoints.

To achieve the desired running time, we must be able to maintain the list of current segments in O(nlogn) running time, and calculate intersections O(k) time.

If we sort the current segments sorted along the sweep line, we notice that the red and blue segments form clumps. We can collect these clumps of single coloured segments and store each group as a bundle. The red and blue bundles are linked together using "purple pointers".


Figure 4: Left: Local view of a sweep line. Right: Data structure for storing the red and blue segments that are currently being swept.

We embed the red bundles in a tree structure for faster access to the correct red bundle. Blue bundles can then be accessed using the purple pointers. By only embedding one colour of bundles in a tree, we can save some pointer overhead. Inside the bundles the segments are also be embedded in trees for faster access.

By only embedding red bundles in a tree we introduce some asymmetry, but not as much as in Chan's algorithm.

Witnessing Intersections and Maintaining the List of Currently Swept Segments

There are four locations in which a new segment x can be inserted in the list of currently swept segments:
  1. inside a red bundle
  2. above a red bundle, below a blue bundle
  3. above a red bundle, inside a blue bundle, and
  4. above both a red and blue bundle.
These cases are illustrated in Figure 5. The purple segment is the red or blue segment being inserted. The dashed blue lines indicate potential intersections which would be witnessed by the inserted segment.


Figure 5: Non-degenerate insertion cases.

When we insert segment x, we must also check for witnessed intersections. Intersection occur when the order of segments changes along the sweep line. Since there are no red-red or blue-blue intersections, the only change that can occur is the interleaving of red and blue segments. After finding x's location with respect to the red segments, we must check the neighboring blue bundles to see if any segments get pulled to above or below x. Because the segments are stored as bundles, it only takes constant time to find the blue segments closest to x.

When a blue bundle gets shifted to the other side of a red bundle, we must record intersections. We can determine the order of the intersections along each line segment by using the order of the segments in the bundles, combined with the knowledge of which bundle is moving down and which is moving up. Each intersection can be recorded in constant time, and thus recording all k intersections will take O(k) time.

If an entire blue bundle is shifted to the opposite side of x, we must merge the two red bundles which would then be adjacent to each other. If only part of a blue bundle is shifted, we must split the blue bundle, leaving part where it was and shifting the other part. If the bundle on the other side of x is also blue, the moved bundle must be merged with it.

If an entire blue bundle is shifted, we must also check the next blue bundle to see if it has any witnessed intersections as well. Once we encounter a blue segment that doesn't shift, we can stop.

When there are no degeneracies, we will only have to shift blue segments from one direction, because shifting blue segments from both directions would cause a blue-blue intersection.

Note that when a segment is inserted, there are at most two split operations.

The operations performed when a segment is removed from the list of currently swept segments is similar. These cases are shown in Figure 6. The curved dashed blue lines indicate potential intersections which would be witnessed by the terminating segment. The straight dashed lines indicate potential intersections caused by a difference in start and end position along the sweep line.


Figure 6: Non-degenerate removal cases.

We can see that in non-degenerate removal cases, there are at most two split operations.

Degeneracies cause additional cases to consider. With an endpoint degeneracy (Figure 7), not only do we have to consider the cases depicted in Figures 5 and 6, we have to make sure all of the segments that share an endpoint have the degenerate intersection recorded.


Figure 7: Degenerate case where multiple segments share an endpoint.

To make sure that we record all of the intersections in the proper order, we need to deal with all of the relevant segments simultaneously. If we dealt with them separately, intersections might get missed or recorded in the wrong order. As we can see in Figure 7, if we wish to record the intersections in counterclockwise order, it is helpful if when there is a tie in the priority queues storing the upcoming start and end points we use counterclockwise order to break ties.

The other type of degeneracy is when two segments of opposite colour have both endpoints the same. This degeneracy will appear as a degenerate endpoint intersection at both the start and end points. When we are computing the endpoint degeneracies, we should also check for the identical segment degeneracy, and only record it once. We can ensure that segment degeneracies are only recorded once by choosing to record this type of degeneracy only at the start point (or equivalently, the end point).

Analysis of the Algorithm

A single segment causes only O(1) splits, and thus there are O(n) splits total in the worst case. n single segment bundles can only be merged O(n) times, and each additional split allows an additional merge. Thus there are a total of O(n) splits and merges possible. As long as each split and merge amortizes to have a O(logn) running time, our algorithm will be able to compute all intersections in O(nlogn+k) running time.

To make sure all tree operations have a O(logn) running time, we will use splay trees [2,3]. Splay trees are a type of balanced binary search tree. Unlike other types of balanced trees, splay trees do not use any explicit rules to enforce a balanced condition or store balancing information in the nodes. Instead, after each find/insertion/deletion operation, a splaying step is performed. Splaying a node brings moves it up to the root. Amortized analysis shows that n operations run in O(nlogn) time, and they have the added benefit that frequently accessed nodes are close to the root.

Splay trees are suitable for this algorithm because it is possible to merge and split them in O(logn) time. The invariant in the regular amortized analysis is:

Before and after a splaying, each node v of T has r(v) cyber-dollars in its account.[2]
where r(v) is the rank of v. If there are n(v) nodes in the subtree rooted at v, then the rank of v is defined as r(v) = log(n(v)). As long as merge and split operations maintain this invariant with a payment of at most O(logn) "cyber-dollars" per operation, the analysis will hold.

To split a tree at node x, we first call Find(x), then remove the appropriate subtree of x. (See Figure 8.)


Figure 8: Splitting a splay tree after the element x.

We know that Find(x) runs in logarithmic time. Removing a subtree of the root takes constant time, and leaves the root with an overpayment of cyber-dollars in its account. Thus the split operation will certainly not take any longer than O(logn) amortized time.

To merge two trees, we first splay the largest item of the left tree to the top, then set the right tree as its subtree. (See Figure 9.)


Figure 9: Merging two splay trees.

The Find(x) operation takes O(logn) time, and we can pay O(logn) cyber-dollars into x's account to make sure it remains balanced. Thus the merge operation will run in O(logn) amortized time.

Since all tree operations will run in O(logn) amortized time, the entire algorithm will run in O(nlogn+k) time.

Large Datasets

Any large dataset that consists of layers of polygons will suffice to demonstrate the final algorithm.

Conclusions and Future Work

The algorithm described runs in O(nlogn+k) running time and is conceptually simpler and more symmetric than Chan's algorithm, but I have not coded Chan's algorithm to compare actual coding complexity. Although this algorithm is easy to understand and describe, the actual coding involves a lot of pointers and it is easy to make mistakes which slow down the coding process. My future work (for Friday) is to finish the actual implementation of the described algorithm (probably without degeneracy handling code).

Related Links

Line Segment Intersections: Boolean Algorithms:

References

[1] Jean-Daniel Boissonnat and Jack Snoeyink, "Efficient algorithms for line and curve segment intersection using restricted predicates".

[2] Michael T. Goodrich and Roberto Tamassia, Data Structures and Algorithms in Java, John Wiley and Sons, Inc., New York, 1998.

[3] Jeffrey H. Kingston, Algorithms and Data Structures: Design, Correctness, Analysis, Second Edition, Addison Wesley Longman Ltd., Harlow, 1998.

[4] Larry Palazzi and Jack Snoeyink, "Counting and reporting Red/Blue Segment Intersections", CVGIP: Graphical Models and Image Processing, Vol. 56, No. 4, July, pp. 304-310, 1994.

[5] Jack Snoeyink, private communication, August 19, 1999.