No description

Find a file

Johannes Erwerle d4d621eea7 added script for random vs greedy comp		2022-10-04 07:50:58 +02:00
benchmark_data	added benchmark data sets	2022-08-19 08:41:31 +02:00
grids	fixed the broken FMI file parser	2022-09-13 20:41:07 +02:00
landmarks	renamed handpicked landmarks	2022-09-15 19:47:01 +02:00
routes	fixed the broken FMI file parser	2022-09-13 20:41:07 +02:00
src	cleaned up the code	2022-09-16 17:14:28 +02:00
static	added working ALT	2022-09-15 09:57:46 +02:00
templates	added working ALT	2022-09-15 09:57:46 +02:00
test_graphs	added small test graphs	2022-08-19 08:42:02 +02:00
utils	added script for random vs greedy comp	2022-10-04 07:50:58 +02:00
.gitignore	initial commit	2022-08-05 10:03:57 +02:00
Cargo.lock	Fixed broken Cargo.toml	2022-09-16 17:21:34 +02:00
Cargo.toml	Fixed broken Cargo.toml	2022-09-16 17:21:34 +02:00
README.md	added benchmark results to README	2022-09-16 17:14:03 +02:00

README.md

FaPra Algorithms on OSM Data

This repository contains implementations for the Fachpraktikum Algorithms on OSM Data. The code is written in Rust, a stable rust compiler and cargo, the rust build tool/package manager is required.

Building

simply run cargo build --release to build release versions of all binaries.

Tasks

Task 1

There is no real implementation work for task 1, next!

Task 2 + Task 3 + Task 4

Reading the data from an OSM PDF file and converting it to a graph is done in src/bin/generate_grid.rs.

The implementation of the spherical point in polygon test is done in src/polygon.rs with the function Polygon::contains.

There is one polygon in the graph, for which no valid outside polygon can be found. I did not have the time to investigate this further.

Extracting Coastlines from the PBF file

The code uses the osmpbfreader crate. Sadly this module uses ~10GB of memory to extract the data from the PBF file with all the coastlines.

Point in Polygon

The test by Bevis and Chatelain is implemented. Instead of using a point that is inside the polygon a point outside of the polygon is used, because here we can simply specify several "well-known" outside points that are somewhere in the ocean and therefore are outside of every polygon.

Grid Graph

The Grid Graph is implemented in gridgraph.rs.

A regular grid graph can be generated with the generate_regular_grid function. Import and Export from/to a file can be done with the from_fmi_file and write_fmi_file functions.

Task 5

Dijkstra Benchmarks

Dijkstras algorithm is implenented in gridgraph.rs with GridGraph::shortest_path. It uses a Heap to store the nodes. For details on how to run benchmarks see the benchmarks section at the end.

Task 6

The UI is a Web-UI based on leaflet. To start it, run task6 with a .fmi file as a graph layout and a set of landmarks (see Task 7 for details).

The start and end nodes can be placed on arbitrary positions and an algorithm searches for the closes grid node and then runs the Routing.

The webserver listens on http://localhost:8000

Currently there is a display bug when routes wrap around the globe at 180 degrees longitude. Instead of continuing the line as expected the line is drawn around the globe in the "wrong" direction. This is however just a display bug, the route itself is correct.

Task 7

I implemented ALT, as described in [1]. Additionally A* is available with a simple, unoptimized haversine distance as the heuristic.

A* is implemented in src/astar.rs and the heuristics for ALT are implemented in src/alt.rs.

Landmarks for ALT

currently 3 different landmark generation methods are available

random selection
greedy, distance-maximizing selection
manual selection (from a GeoJSON file)

These can be generated with the gen_landmarks_random, gen_landmarks_greedy and gen_landmarks_geojson binaries. The random and greedy methods take the number of landmarks to select from a parameter.

A handy wrapper for that can be found as a Python script in utils/generate_landmarks.py that generates landmarks for 4, 8, 16, 32 and 64 landmarks, both greedy and random.

Running the benchmarks

First a set of queries is needed. These can be generated with generate_benchmark_targets --graph <graph> > targets.json. This generates 1000 random, distinct source and destination pairs. The --amount parameter allows to adjust the number of pairs generated.

The actual benchmark is located in benchmark. It needs a set of queries and a graph.

If the --dijkstra option is given it runs Dijkstras algorithm. If the --astar option is given it runs A* with the haversine distance If --landmarks <landmarkfile> is given it runs ALT with that landmark file. By setting --alt_best_size <number> one can select how many of the landmarks are used to answer the query.

The benchmark prints out how many nodes were popped from the heap for each run and the average time per route.

utils/run_benchmarks.py is a wrapper script that runs the benchmarks for a big set of parameters.

utils/plot_results.py generates several plots of the results.

Results

These are some quick tests, further results will be presented later. Everything was run on a Thinkpad X260 laptop with an Intel i7-6600U CPU @ 2.60GHz processor. Each test used the same 1000 queries. Rust v1.57.0 was used for all tests.

The ALT variants were used with the 4 best landmarks. Further tests on the performance of more landmarks will be presented laster. The set of 44 handpicked landmarks were spread around the extremeties of the continents and into "dead ends" like the Mediteranean and the Gulf of Mexico with the goal to provide landmarks that are "behind" the source or target node.

All benchmarks were run on the provided benchmark graph.

raw data:

# name, (avg. heap pops per query, avg. time)
{'astar': (155019.451, 0.044386497025),
 'dijkstra': (423046.796, 0.058129875474999995),
 'greedy_32': (42514.751, 0.013299024275000002),
 'greedy_64': (35820.461, 0.011887869759),
 'handpicked_44': (70868.721, 0.01821366828),
 'random_32': (58830.082, 0.016845884717),
 'random_64': (51952.261, 0.015234422699)}

Interpretation

Dijkstra needs ~58ms per route, while the best version is greedy_64 (that is with 64 landmarks) needs only 12 seconds, which is ~5 times faster. We also see, that the greedy versions perform slightly better than their random counterparts with the same amount of nodes. While the 44 handpicked landmarks outperformed A* and Dijkstra, they are beaten by both the random and greedy landmark selections which had fewer nodes.

Memory Consumption

The landmarks are basically arrays of the cost to each node. Since the distances are currently calculates with 64 bit integers each landmark needs 8 byte per node in the graph. With a graph that has about 700k nodes this leads to ~5.5MB of memory per landmark. So 64 landmarks need ~350MB of memory.

One could also use 32 bit integers which would half the memory requirements.

References

[1](Computing the Shortest Path: A* meets Graph Theory, A. Goldberg and C. Harrelson, Microsoft Research, Technical Report MSR-TR-2004-24, 2004)