Example n° 8: Extract binary mask from vector shapefile

The target use case for this example is:

  • Lossy images representing regions of a gridded area for an interesting region with no masks available. Some images may contain data outside of the region of interest.
  • An ancillary shapefile contains a record for each image, providing the region of interest (marked in yellow in the following image).
  • We want to use the shapefile to embed a binary mask in the images.
../../_images/rgb_mask.png

For this example we will use the same data of example 7.

Note

On a Windows machine you can set-up a shell with all GDAL Utilities, running directly the file OSGeo4W.bat under the %TRAINING_ROOT% folder.

  1. Navigate to the workshop directory $TRAINING_ROOT/data/user_data/gdal_processing_data (on Windows %TRAINING_ROOT%\data\user_data\gdal_processing_data) and find the rgb_data directory.

First, let’s set an environment variable to enable usage of internal masks:

export GDAL_TIFF_INTERNAL_MASK=YES (linux)

  or

SET GDAL_TIFF_INTERNAL_MASK=YES (windows)

In this example we are going to do the following processing steps:

  1. The shapefile geometries is rasterized by filtering on each element of the grid.
  2. The rasterized mask get attached to the original image.

This time we are going to set up a small python script to extract information from the rasters, and use it in the GDAL command invocations.

Let’s see the processing being used on single GeoTIFF first. The command is included in a getmask.py file. It’s minimal, skipping parameters validation and additional controls for the sake of simplicity:

from osgeo import gdal
import os
import subprocess
import sys

shapefile = 'rgb_vectormask.shp'
shape = 'rgb_vectormask'

# Extracting the filename from the arguments
file = str(sys.argv[1])

outfile = file + '.msk.tif'

# files are named like rgb_r1c2.tif, rgb_r2c2.tif, rgb_r4c4.tif...
# so let's extract the rxcy part identifying the position in the grid
grid_id = file[4:8]
src_raster = gdal.Open(file)

# Get the width and height of the raster as well as the bbox
x_size = src_raster.RasterXSize
y_size = src_raster.RasterYSize
bbox_dict = gdal.Info(src_raster, format="json")["cornerCoordinates"]
ur = bbox_dict["upperRight"]
ll = bbox_dict["lowerLeft"]

# This command creates an output tif having same size and bbox of the underlying image
# setting pixel value to 1 on the geometry matching the grid id.
gdal_rasterize_str = "{0} -at -burn 1 -where grid_id='{1}' -l {2} -a_srs EPSG:26913 -ts {3} {4} -te {5} {6} {7} {8} -CO TILED=YES -CO COMPRESS=DEFLATE -CO PHOTOMETRIC=MINISBLACK -CO NBITS=1 -OT BYTE -OF GTIFF {9} {10}"
gdal_rasterize_process = gdal_rasterize_str.format(
   "gdal_rasterize",
    grid_id,
    shape,
    x_size,
    y_size,
    ll[0],
    ll[1],
    ur[0],
    ur[1],
    shapefile,
    outfile,
)
subprocess.run(gdal_rasterize_process, shell=True)

The command invocation can be integrated in a similar command used in example 7 to produce a GeoTIFF with internal mask.

Windows (you can put this on a vectormask.bat file in the directory containing the files):

for %%F in (rgb_????.tif) do  (

  gdalbuildvrt -b 1 %%~nF_b1.vrt %%F
  gdalbuildvrt -b 2 %%~nF_b2.vrt %%F
  gdalbuildvrt -b 3 %%~nF_b3.vrt %%F

  python getmask.py %%F

  echo Recompose the 3 bands together with the mask:
  gdalbuildvrt -separate %%~nF_final.vrt %%~nF_b1.vrt %%~nF_b2.vrt %%~nF_b3.vrt %%F.msk.tif

  echo Write the result
  gdal_translate -CO "COMPRESS=JPEG" -CO "PHOTOMETRIC=YCBCR" -CO "JPEG_QUALITY=85" -CO "TILED=YES" -b 1 -b 2 -b 3 -mask 4 %%~nF_final.vrt masked_%%~nF.tif
)

Finally, let’s add the overviews:

Windows (you can put this on a maskedaddo.bat file in the directory containing the files))

for %%F in (masked*.tif) do  (
  gdaladdo -r average %%F 2 4 8 16 32
)

Linux:

#!/bin/bash
export GDAL_TIFF_INTERNAL_MASK=YES

FILES="rgb_????.tif"
echo Computing binary mask
for F in $FILES
do
  echo Processing $f
  A=${F%%.*}_b1.vrt
  B=${F%%.*}_b2.vrt
  C=${F%%.*}_b3.vrt
  gdalbuildvrt -b 1 $A $F
  gdalbuildvrt -b 2 $B $F
  gdalbuildvrt -b 3 $C $F
  python getmask.py $F
  gdalbuildvrt -separate ${F%%.*}_final.vrt $A $B $C ${F%%.*}.msk.tif
done

echo Writing composed files
for F in $FILES
do
  gdal_translate -CO "COMPRESS=JPEG" -CO "PHOTOMETRIC=YCBCR" -CO "JPEG_QUALITY=85" \
  -CO "TILED=YES" -b 1 -b 2 -b 3 -mask 4 ${F%%.*}_final.vrt masked_${F%%.*}.tif
done

echo Adding overviews
FILESOV="masked*.tif"
for F in $FILESOV
do
  gdaladdo -r average $F 2 4 8 16 32
done

Once configured in GeoServer with transparent footprint, the result will look as follows:

../../_images/rgb_masked.png

This kind of processing can be used in similar scenarios:

  • A set of georeferenced cartographic images (including legends and notes) composing a region/district, plus a shapefile with footprints for each sheet of the map

    ../../_images/cartographic_mosaic.png
  • A set of lossy compressed datasets with shapefile footprint. Compression artifacts at the edges will ruine transparent masking based on pixel values.

    ../../_images/lossy_image.png