KD-Box: Line-segment-based KD-tree for Interactive Exploration of Large-scale Time-Series DataYue Zhao, Jian Zhang, Chi-Wing Fu, Mingliang Xu, Dominik Moritz, Yunhai Wang
Time-series data—usually presented in the form of lines—plays an important role in many domains such as finance,meteorology, health, and urban informatics. Yet, little has been done to support interactive exploration of large-scale time-series data,which requires a clutter-free visual representation with low-latency interactions. In this paper, we contribute a novel line-segment-based KD-tree method to enable interactive analysis of many time series. Our method enables not only fast queries over time series in selected regions of interest but also a line splatting method for efficient computation of the density field and selection of representative lines. Further, we develop KD-Box, an interactive system that provides rich interactions, e.g.,timebox,attribute filtering, and coordinated multiple views. We demonstrate the effectiveness of KD-Box in supporting efficient line query and density field computation through a quantitative comparison and show its usefulness for interactive visual analysis on several real-world datasets.
Figure 1: Overview of our tree construction: (a) a set of input curves with point samples (white) and split points (orange); (b) the KD-tree in TVS space built for the input curves with the newly-added split points (yellow), the thickness of split planes’ borders (blue) indicates the level, where split planes of thicker borders are in the upper (higher) levels of the KD-tree; and (c) the curve segment in each grid cell volume is fit by a straight-line segment.
Figure 2: Illustrating tree-traversal query and boundary-based filtering. (a)Decompose the space covered by time series to with the tree shown in (b); to denote the bounding volumes (grid cells) associated with the tree’s leaf nodes. The blue box denotes a timebox query with a pair of ghost ranges (red) on top and bottom. (b) Traversing the tree can efficiently find all leaf nodes and lines (,, and ) that intersect the timebox. Next, boundary-based filtering employs the two ghost ranges to filter the lines: we discard , since it intersects the bottom ghost range,meaning that some of its points in[,]must be out of the timebox.
Figure 3: Illustrations of the incremental query update for three different cases of modifying the query widget of a range query, from Ro to Rn. (a)Case 1: when enlarging the query widget, we insert the lines in range Rn−Ro. (b) Case 2: when reducing the query widget, we remove the lines in range Ro−Rn but retain those that pass through the green ghost range. (c) Case 3: when moving the query widget, we need to insert the lines in Rn−Ri following Case 1 and then remove the lines in Ro−Ri following Case 2, where Ri=Ro∩Rn. For all three cases, we need to remove lines that cross the red ghost ranges in the end.
Figure 4: Querying representative lines on a synthetic dataset with many time series. (a) An overplotting line graph consists of lines coming from three groups whose representative lines are highlighted; (b) a density visualization reduces the visual clutter with three representative lines aligned with the overall trends in the density field; and (c) we use the timebox widget (in blue) to query a set of lines and then find three representative ones. In (b& c), the line indices (one to three) indicate a decreasing density order of the lines.
Figure 5: Boxplots on the line recall (a), line precision (b), query time (c,d,e),and tree-building time (f) for the RNL search (r=0.02), timebox query, and angular query on all datasets using the four query methods.
This work is supported by the grants of the NSFC (61772315, 61861136012), the Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No.VRLAB2020C08), and the CAS grant (GJHZ1862).