This challenge will be evaluated on a multitude of datasets which are freely available including the following demo dataset. Each dataset consists of a set of files each holding the data for each trajectory. The files are in a space-separated value format and the first line should be ignored (it usually holds column name information). The first two columns are always the X and Y coordinate. A sample file begins like this:
x y k tid -13632575.57 4543280.813 0 0 -13632504.33 4543237.173 1 0 -13632406.37 4543172.416 2 0 -13632322.88 4543121.738 3 0 -13632167.03 4542909.172 4 0 -13632119.16 4542835.971 5 0 -13632148.11 4542831.748 6 0 -13632285.03 4542813.448 7 0 -13632308.41 4542810.632 8 0
For this challenge, we ignore the curvature on earth and map projections, so you should expect any sizes of numbers for X and Y and treat them as if they were plain Euclidean coordinates. You should also ignore the contents of the first line of each file, it might look different during evaluation.
A dataset is specified as a single file with filenames on each line. In this way, we can create multiple datasets of varying sizes from the same set of trajectory files. Such a file for the sample dataset begins like
files/file-012487.dat files/file-018771.dat files/file-015204.dat files/file-006556.dat files/file-014161.dat files/file-019910.dat files/file-008207.dat files/file-007962.dat files/file-001864.dat
Note that the file names shall be used as unique keys in the output file format below, so you have to load these into memory. If you want to create new datasets, it is best to use find, sort and head on Linux systems like this:
> find files | sort -R | head -500 > dataset.txt
which generates a random list of 500 files in the directory files. Using find instead of ls makes the output contain the full path to the file. For Geolife, for example, you could use a
find -name *.plt to generate a similar file.
The problem itself is specified as a file queries.txt containing one query per line. The first line will be numbered zero. Such a problem file looks similar to
files/file-015204.dat 1000 files/file-006556.dat 123.4 files/file-014161.dat 726.1
Note that the files might not be part of dataset.txt (a query might or might not be loaded along with loading the dataset)
For each line, your submission should generate a file result-XXXX.txt, where XXXX is the line number. A result file looks identical to the dataset files similar to
meaning that these two trajectories fulfilled the given query range.
The following sources of trajectory information can be used for benchmarking your solution with varying sizes for datasets and individual trajectories.
- T-Drive Dataset (note that this needs some preprocessing for file formats as well as extracting reasonable size trajectories.)
- Roma Taxi Traces (note that this needs some preprocessing for file formats as well as extracting reasonable size trajectories.)
- Character Trajectories (note that this needs some preprocessing for file formats as well as extracting reasonable size trajectories.)