Deep Spatial-Temporal Embedding for Vehicle Trajectory Validation and Refinement

1Rutgers, The State University of New Jersey
2University of Tennessee at Chattanooga

Abstract

High-angle cameras are commonly used for trajectory data collection in transportation research. However, without refinement and validation, trajectory data obtained through video processing software may be unreliable, inaccurate, or incomplete. This paper focuses on a critical issue in the field of trajectory data acquisition and analysis --- there is still no reliable and fully vetted trajectory dataset in the research community. The current practice for validating video-based detection results is not scalable, mainly relying on brute-force efforts to watch the video replay and repair the incorrect bounding boxes frame-by-frame. To enhance the performance, the Deep Spatial-Temporal Embedding (DSTE) model is proposed for trajectory instance segmentation on Spatial-temporal Maps (STMaps) using the contrastive learning framework. The parity constraints at both pixel and instance levels guide the deep neural network to learn the embedding spaces that can be built on different backbone networks. The reconstructed trajectory dataset is thoroughly validated against manually processed ground truth, and the error-free NGSIM data is refined to be a reliable resource for transportation research based on car-following behaviors, lane-change frequency, consistency, and jerk value measurements. In essence, STMap represents a significant advancement in the field of vehicle trajectory validation, promising both the meticulousness of direct validation and the scalability necessary for handling extensive traffic data.

Trajectory Validation Methods

  • Indirect Methods of Trajectory Validation:

    Indirect methods involve algorithmic checks against physical laws to identify and correct implausible trajectories. These methods detect anomalies by comparing the observed data against expected behaviors derived from traffic models. For instance, trajectories that suggest abrupt stops or accelerations that exceed typical vehicle capabilities are flagged as suspect. Once false trajectories are identified, they are often corrected using regression-to-the-mean models, such as car-following models, which reflect average driver behavior, or through smoothing techniques like low-pass filters that remove high-frequency noise from the data.

    The primary advantage of indirect methods is efficiency. They can process large volumes of data quickly without human intervention. The downside is that while these methods are good at ensuring overall data plausibility, they might not capture the nuances of individual driving behaviors, potentially leading to a loss of detail in the data.

  • Direct Methods of Trajectory Validation:

    Direct methods, on the other hand, rely on manual inspection of trajectory tracklets and their associated bounding boxes in each frame of video data. This process ensures that each trajectory is faithful to what was actually observed, preserving unique driving behaviors and characteristics that indirect methods might overlook.

    The fidelity of data obtained through direct methods is typically higher because human validators can discern complex scenarios that automatic methods might misinterpret. However, this comes at the cost of being labor-intensive, which can significantly limit the amount of data that can be processed. Direct methods are often reserved for smaller datasets where the highest possible accuracy is required, or they are used to create ground truth data for the development and calibration of indirect methods.

  • Spatial Temporal Map in Trajectory Validation:

    Spatial Temporal Map (STMap) is an innovative approach that enhances the direct method of vehicle trajectory validation, enabling it to be applied to large-scale datasets while maintaining high data fidelity. STMap creates a visual representation where time is one of the dimensions, alongside the spatial dimensions of the trajectory. By doing so, it allows for the rapid manual inspection of trajectories over time, as irregularities or anomalies can be more easily spotted in this format.

    The strength of STMap lies in its ability to condense complex trajectory information into a more manageable and interpretable form. This transformation simplifies the validation process and reduces the time required for manual checks. Validators can quickly verify long sequences of movement by looking at a single STMap, rather than inspecting each frame individually.

    This approach provides a scalable solution to the labor-intensive nature of direct methods, offering a balance between the thoroughness of manual validation and the need for efficiency in processing extensive datasets.

Key Innovations and Contributions

  • Integral Solution for Detection and Tracking: Conversion of the detection and tracking task into an instance segmentation task using STMap, significantly reducing labor in adjusting detection and tracking errors frame by frame.
  • Deep Spatial-Temporal Embedding (DSTE) Model: A contrastive learning-based model that surpasses current segmentation baselines by integrating local and global correlations. It facilitates proposal-free instance segmentation for more precise instance differentiation.
  • Enhanced Scanline Approach to NGSIM Datasets: Improvement of NGSIM datasets' quality and reliability, crucial for traffic flow studies and the development of accurate traffic models and simulations.
  • Comprehensive Evaluation of Trajectory Data: Introduction of a detailed statistical quality assessment for video-based trajectory datasets, covering multiple aspects such as Position & Speed Accuracy and Internal Consistency. This is considered the most thorough evaluation for these types of datasets.

Pipeline

pipeline image

Instance Segmentation Compared to SOTA Baselines

Deep-Spatial-Temporal-Embedding (DSTE) Instance Segmentation Output vs. Baseline Models. 1- STMap with Clean Background; 2-STMap with Dense Trajectories and Shadows; 3- STMap with Shadows and Static Noises; Our model is compared with SOTA proposal-based and proposal-free instance segmentation models. The Instance outputs are based on imperfect semantic foreground masks where trajectory strands are often stitched.

More results

We show more Trajectory Reconstruction results below. Inlcuding Instance Level Output and Statistic Evaluation

Scanline and Spatial Temporal Map With NGSIM Video

Instance Segmentation Output

More Trajectory Evaluation Results

BibTeX

@article{zhang2024DSTE,
      title   ={Deep Spatial-Temporal Embedding for Vehicle Trajectory Validation and Refinement},
      author  ={Zhang, T. Terry and Jin, Peter J. and Piccoli, Bennedetto and Sartipi Mina},
      journal ={Computer-Aided Civil and Infrastructure Engineering},
      year    ={2024}
}