You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Data Preparation for Satellite Machine Learning
3
3
4
-
The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in Machine Learning training.
4
+
The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in machine learning training.
5
5
6
6

7
7
_satellite imagery from [Mapbox](https://www.mapbox.com/) and [Digital Globe](https://www.digitalglobe.com/)_
@@ -13,159 +13,14 @@ _satellite imagery from [Mapbox](https://www.mapbox.com/) and [Digital Globe](ht
13
13
## Installation
14
14
15
15
```bash
16
-
pip install label_maker
16
+
pip install label-maker
17
17
```
18
18
19
19
Note that running this library this requires `tippecanoe` as a "peer-dependency" and that command should be available from your command-line before running this.
20
20
21
-
## Configuration
22
-
23
-
Before running any commands, it is necessary to create a `config.json` file to specify inputs to the data preparation process:
-`country`: The [OSM QA Tile](https://osmlab.github.io/osm-qa-tiles/) extract to download. The value should be a country string matching a value found in `label_maker/countries.txt`
41
-
-`bounding_box`: The bounding box to create images from. This should be given in the form: `[xmin, ymin, xmax, ymax]` as longitude and latitude values between `[-180, 180]` and `[-90, 90]` respectively. Values should use the WGS84 datum, with longitude and latitude units of decimal degrees.
42
-
-`zoom`: The [zoom level](http://wiki.openstreetmap.org/wiki/Zoom_levels) to create images as. This functions as a rough proxy for resolution. Values should be given as integers.
43
-
-`classes`: An array of classes for machine learning training. Each class is defined as an object with two required properties:
44
-
-`name`: The class name
45
-
-`filter`: A [Mapbox GL Filter](https://www.mapbox.com/mapbox-gl-js/style-spec#other-filter) to define any vector features matching this class. Filters are applied with the standalone [featureFilter](https://github.com/mapbox/mapbox-gl-js/tree/master/src/style-spec/feature_filter) from Mapbox GL JS.
46
-
-`buffer`: The number of pixels to buffer the geometry by. This is an optional parameter to buffer the label for `object-detection` and `segmentation` tasks. Accepts any number (positive or negative). It uses [Shapely `object.buffer`](https://shapely.readthedocs.io/en/latest/manual.html#object.buffer) to calculate the final geometry. You can verify that your buffer options create the desired labels by inspecting the files created in `data/labels/` after running the `labels` command.
47
-
-`imagery`: One of:
48
-
- A template string for a tiled imagery service. Note that you will generally need an API key to obtain images and there may be associated costs. The above example requires a [Mapbox access token](https://www.mapbox.com/help/how-access-tokens-work/)
49
-
- A GeoTIFF file location. Works with both local and remote files. Ex: `'http://oin-hotosm.s3.amazonaws.com/593ede5ee407d70011386139/0/3041615b-2bdb-40c5-b834-36f580baca29.tif'`
50
-
-`background_ratio`: For single-class classification problems, we need to download images with no matching class. We will download `background_ratio` times the number of images matching the one class.
51
-
-`ml_type`: One of `"classification"`, `"object-detection"`, or `"segmentation"`. For the final label numpy arrays (`y_train` and `y_test`), we will produce a different label depending upon the `type`.
52
-
-`"classification"`: An array of the same length as `classes`. Each array value will be either `1` or `0` based on whether it matches the class at the same index
53
-
-`"object-detection"`: An array of bounding boxes of the form `[xmin, ymin, width, height, class_index]`. In this case, the values are not latitude and longitude values but pixel values measured from the upper left-hand corner. Each feature is tested against each class so if a feature matches two or more classes, it will have the corresponding number of bounding boxes created.
54
-
-`"segmentation"`: An array of shape `(256, 256)` with values matching the class_index label at that position. The classes are applied sequentially according to `config.json` so latter classes will be written over earlier class labels.
55
-
-`imagery_offset`: An optional list of integers representing the number of pixels to offset imagery. For example `[15, -5]` will move the images 15 pixels right and 5 pixels up relative to the requested tile bounds.
56
-
57
-
## Command Line Use
58
-
59
-
`label-maker` is most easily used as a command line tool. There are five commands documented below. All commands accept two flags:
60
-
-`-d` or `--dest`: _string_ directory for storing output files. (default: `'data'`)
61
-
-`-c` or `--config`: _string_ location of config.json file. (default `'config.json'`)
Retiles the OSM data to the desired zoom level, creates label data (`labels.npz`), calculates class statistics, creates visual label files (either GeoJSON or PNG files depending upon `ml_type`). Requires the OSM QA tiles from the previous step. Accepts an additional flag:
81
-
-`-s` or `--sparse`: _boolean_ if this flag is present, only save labels for up to `n` background tiles, where `n` is equal to `background_ratio` times the number of tiles with a class label.
82
-
83
-
```bash
84
-
$ label-maker labels
85
-
Determining labels for each tile
86
-
---
87
-
Residential: 638 tiles
88
-
Total tiles: 1189
89
-
Write out labels to data/labels.npz
90
-
```
91
-
92
-
### Preview
93
-
94
-
Downloads example satellite images for each class. Requires the `labels.npz` file from the previous step. Accepts an additional flag:
95
-
-`-n` or `--number`: _integer_ number of examples images to create per class. (default: `5`)
96
-
97
-
```bash
98
-
$ label-maker preview -n 10
99
-
Writing example images to data/examples
100
-
Downloading 10 tiles for class Residential
101
-
```
102
-
103
-
### Images
104
-
105
-
Downloads all imagery tiles needed for training. Requires the `labels.npz` file from the `labels` step.
106
-
107
-
```bash
108
-
$ label-maker images
109
-
Downloading 1189 tiles to data/tiles
110
-
```
111
-
112
-
### Package
113
-
114
-
Bundles the satellite images and labels to create a final `data.npz` file. Requires the `labels.npz` file from the `labels` step and downloaded image tiles from the `images` step.
115
-
116
-
```bash
117
-
$ label-maker package
118
-
Saving packaged file to data/data.npz
119
-
```
120
-
121
-
## Using the Packaged Data
122
-
123
-
Once you have a packaged `data.npz` file, you can use [`numpy.load`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) to load it. As an example, here is how you can supply the created data to a [Keras](https://keras.io) Model:
124
-
125
-
```python
126
-
# the data, shuffled and split between train and test sets
0 commit comments