A visual analytics application for exploring NYC taxi trip data using Qt5 and OpenGL.
✅ Complete: Qt5 migration finished! All features working including OpenStreetMap tile-based geographic visualization.
Required:
- CMake 3.10 or higher
- Qt 5 (5.15+ recommended)
- Qt5 modules: Core, Gui, Widgets, OpenGL, Network, PrintSupport
- OpenGL/GLEW 2.2+
- Boost 1.42+ (components: iostreams, filesystem, timer)
macOS Installation (Homebrew):
brew install cmake qt@5 glew boostLinux Installation (Ubuntu/Debian):
sudo apt-get install cmake qt5-default libqt5opengl5-dev libglew-dev libboost-all-devBuild everything (recommended):
mkdir build
cd build
# macOS - set Qt5 path
export CMAKE_PREFIX_PATH="/opt/homebrew/opt/qt@5"
# Configure and build both TaxiVis and preprocessing tools
cmake ..
make -j4This builds:
src/TaxiVis/TaxiVis- Main visualization applicationsrc/preprocess/csv2Binary- CSV convertersrc/preprocess/build_kdtrip- KD-tree indexersrc/preprocess/multiCsv2Binary- Batch CSV convertersrc/preprocess/newFormatCsv2Binary- Alternative CSV convertersrc/preprocess/sampling- Data sampling toolsrc/preprocess/testQuery- Query testing utilitysrc/preprocess/unif96_to_bin- Legacy format converter
Qt5 not found: Set CMAKE_PREFIX_PATH to your Qt5 installation:
export CMAKE_PREFIX_PATH="/path/to/qt5"
# macOS Homebrew: /opt/homebrew/opt/qt@5
# Linux: /usr/lib/x86_64-linux-gnu/qt5GLEW not found: Ensure GLEW is installed via your package manager.
A sample dataset of 10,000 trips from January 2013 is included in data/sample_merged_1.kdtrip.
The data directory is configured in CMakeLists.txt (line 11):
add_definitions(-DDATA_DIR=\"${CMAKE_CURRENT_SOURCE_DIR}/../../data/\")From the build directory:
./src/TaxiVis/TaxiVis- Geographic Map View - OpenStreetMap tile-based visualization with pan/zoom
- Three-tier caching: memory → disk → network
- Persistent tile cache at
~/Library/Caches/TaxiVis/tiles(macOS) or~/.cache/TaxiVis/tiles(Linux) - Keyboard controls: Arrow keys to pan, +/- to zoom
- Temporal Series Plots - Visualize trip metrics over time (fare, distance, duration, etc.)
- Histograms - Distribution analysis of trip attributes
- Scatter Plots - Correlation analysis between variables
- Selection Graphs - Define spatial/temporal query regions
- Color Scales - Multiple color schemes for data visualization
- Data Export - Query and export trip subsets
- Animated Trip Paths (TripAnimation) - Disabled on macOS due to geometry shader compatibility issues
- Alternative: Use TripLocation layer to visualize pickup/dropoff points
- File → Open Dialog - Not yet implemented; change dataset in querymanager.cpp and rebuild
TaxiVis requires taxi trip data in a specialized binary format (.kdtrip). The preprocessing pipeline converts raw CSV data into this indexed format for efficient querying.
The preprocessing workflow:
- Merge - Combine trip and fare CSV files
- CSV → Binary - Convert merged CSV to binary Trip format
- Binary → KdTrip - Build KD-tree spatial index for fast queries
Recommended: Use the Julia-based pipeline for 10-15x faster processing (requires Julia 1.6+).
All preprocessing tools are in src/preprocess/ (Julia) and build/src/preprocess/ (C++).
Requirements:
- Julia 1.6+ installed
- Raw NYC taxi CSV files (trip_data and trip_fare)
Automated Pipeline:
# Process your data in one command
./process_taxi_data.shThe script automatically:
- Merges trip and fare data (multithreaded)
- Converts to binary format (multithreaded)
- Builds KD-tree spatial index
- Displays timing statistics
Performance: Processes ~15 million trips (4GB CSV) in ~2-5 minutes on 16 cores.
Manual Julia Commands:
# Step 1: Merge trip and fare CSVs
julia -t 16 src/preprocess/merge.jl data/trip_data.csv data/trip_fare.csv data/merged.csv
# Step 2: Convert to binary
julia -t 16 src/preprocess/csv2Binary_mt.jl data/merged.csv data/merged.trip
# Step 3: Build KD-tree index
./build/src/preprocess/build_kdtrip data/merged.trip data/merged.kdtripConverts a single CSV file to binary Trip format.
CSV Format Expected:
pickup_time,dropoff_time,pickup_long,pickup_lat,dropoff_long,dropoff_lat,id_taxi,distance,fare_amount,surcharge,mta_tax,tip_amount,tolls_amount,payment_type,passengers,field1,field2,field3,field4
Usage:
./build/src/preprocess/csv2Binary input.csv output.tripExample:
./build/src/preprocess/csv2Binary data/trips_2013-01.csv data/trips_2013-01.tripProcesses multiple CSV files listed in an index file. Useful for batch processing large datasets split across files.
Usage:
./build/src/preprocess/multiCsv2Binary file_list.txt output.tripfile_list.txt format:
/path/to/trips_2013-01-01.csv
/path/to/trips_2013-01-02.csv
/path/to/trips_2013-01-03.csv
Converts NYC taxi data in the newer format (post-2015) where column order differs.
Usage:
./build/src/preprocess/newFormatCsv2Binary input.csv output.tripBuilds a KD-tree spatial index from binary Trip data. This is the final step - produces the .kdtrip file that TaxiVis loads.
Usage:
./build/src/preprocess/build_kdtrip input.trip output.kdtripExample:
./build/src/preprocess/build_kdtrip data/trips_2013-01.trip data/trips_2013-01.kdtripOutput: Creates a KD-tree index optimized for 7-dimensional queries:
- pickup_time
- dropoff_time
- pickup_longitude
- pickup_latitude
- dropoff_longitude
- dropoff_latitude
- taxi_id
Creates a spatially-filtered sample from a .kdtrip file. Filters trips by census tract geometry and time range.
Usage:
./build/src/preprocess/sampling input.kdtrip skip_factor output.csvExample (10% sample):
./build/src/preprocess/sampling data/trips_2013.kdtrip 10 data/sample_10pct.csvNote: Requires data/census_tracts_geom.txt geometry file.
Tests KdTrip query functionality on a .kdtrip file. Useful for verifying index correctness.
Usage:
./build/src/preprocess/testQuery input.kdtripConverts legacy 96-byte uniform binary format to the current Trip format. Only needed for old archived datasets.
Usage:
./build/src/preprocess/unif96_to_bin input.bin output.tripPython script to merge separate trip and fare CSV files into the unified format.
Usage:
python src/preprocess/merge.py trips.csv fares.csv merged.csvOption 1: Update default dataset (edit querymanager.cpp:10):
std::string fname = string(DATA_DIR)+"your_data.kdtrip";Then rebuild TaxiVis.
Option 2: File → Open (not yet implemented in current version)
The application will display:
- Number of trips loaded
- Time range of the dataset
- Data statistics on startup
Binary Trip Structure (48 bytes):
uint32_t pickup_time- Unix timestamp (seconds since epoch)uint32_t dropoff_time- Unix timestampfloat pickup_long- Pickup longitudefloat pickup_lat- Pickup latitudefloat dropoff_long- Dropoff longitudefloat dropoff_lat- Dropoff latitudeuint16_t id_taxi- Taxi/medallion IDuint16_t distance- Distance in 0.01 miles (divide by 100)uint16_t fare_amount- Fare in cents (divide by 100)uint16_t surcharge- Surcharge in centsuint16_t mta_tax- MTA tax in centsuint16_t tip_amount- Tip in centsuint16_t tolls_amount- Tolls in centsuint8_t payment_type- Payment method codeuint8_t passengers- Number of passengersuint16_t field1, field2, field3, field4- Custom/region fields
See CLAUDE.md for detailed architecture documentation.
Key Components:
- KdTrip - Spatial indexing and query engine
- QMapTileWidget - Custom OpenStreetMap tile-based map widget with caching
- SelectionGraph - Graph-based spatial selection system
- Rendering Layers - OpenGL visualization (GridMap, HeatMap, TripAnimation, etc.)
- QCustomPlot - Plotting library for temporal/statistical views
Qt5 Migration Status:
- ✅ All Qt4 → Qt5 API migrations complete
- ✅ OpenGL rendering updated (QGL → QOpenGL)
- ✅ Boost filesystem compatibility fixed
- ✅ Map visualization restored with OpenStreetMap tile-based widget
- ✅ QtWebKit dependency removed (replaced with QNetworkAccessManager)
Build with:
- Qt 5.15.17
- Boost 1.89.0
- GLEW 2.2.0
- CMake 4.1+
Tested on:
- macOS 26.0.1 (Apple Silicon)
- Original data source: http://www.andresmh.com/nyctaxitrips/
- Sample dataset: 10,000 trips from NYC Yellow Taxi, January 2013
See original project documentation for license information.