Nazr-CNN: A deep learning pipeline for damage assessment in UAV Images

Dear MicroMappers,

Today we will describe Nazr-CNN, a deep learning (aka multilayer neural network) pipeline which can be used to automatically detect and categorise damaged infrastructure in UAV images  taken in the aftermath of natural disaster. This is work in progress but preliminary results are promising.

More specifically we obtained around 3000+  images from the island of Vanuatu after it was struck by Cyclone PAM in 2015.  We followed the Expert-Machine-Crowd (EMC) framework described here to carry out the detection and categorization task:A group of SBTF volunteers first tagged the images using MicroMappers.

  1. A group of digital jedis and SBTF first tagged the images using MicroMappers. The process involved (a) identifying houses in an image and creating a bounding box around them and (b) giving each identified house a label indicating the severity of damage: little or no damage; medium damage or severely damaged. Examples of the tagging process is shown in the images below.  The different polygon colours represent the levels of damage. As can be seen, the process of tagging is  quite laborious and not scalable and thus in “time poor” scenarios some form of automation is required.introduction_v1
  2. The tagged images were used to train a machine learning classifier. For this we used deep learning in a way which we think is somewhat novel. While deep learning techniques have gained tremendous popularity in the last decade, most of the published research literature is typically focused on object detection in very large data sets. In fact there are two aspects of deep learning that has attracted so much publicity both in the popular press and the research literature:
    (i) Deep Learning is the only method known that improves performance with more data. Most other machine learning techniques tend to saturate after a certain amount of data;
    (ii) Deep Learning has shown the potential of carrying out “automatic feature engineering.”  If true, this would have profound implications and would essentially solve the Frame Problem of Artificial Intelligence.
  3. New and untagged images were then passed through the trained classifier which ideally would output the location of houses in the image and the incurred level of damage (if any).

Pixel-Level Segmentation

However our problem was not amenable to a straightforward application of deep learning. Firstly, our images were quite complex. For example, as is clear from the image above, we have several houses in the image and of varying levels of damage.   This is very different from object detection where the task is often to identify if the image contains an object of a certain class (e.g., “cat” or “dog”). Our objective was not only to identify whether an image contains a house or not but the location of house  is also very important. In particular, our task was more related to segmentation where success of deep learning has been a mixed bag. In fact this is also reflected in our results. Our biggest source of errors is segmentation of raw pixels into “houses” and “background.”

Given that the images were taken from the UAV makes it very difficult to determine the level of damage suffered by houses. Ideally we could have taken a deep learning algorithm which could be trained to output “mild”, “medium”, “severe” and “background.”  Despite extensive experimentation, and perhaps because of the paucity of data, we could not get a deep learning system to work in order to distinguish between different levels of damage.

Damage assessment based on Texture

A key insight, to distinguish between levels of damage, was the realisation (somewhat obvious in hindsight), that perhaps we should use some form of texture. Notice that this runs contrary to the promise of deep learning being an “automatic feature generator.”  Ideally, the deep learning algorithm should have discovered some form of “texture” as a feature in order to separate the four categories.

In computer vision, “fisher vectors” are known for being good features for texture discrimination. Fisher vectors are generalization of the “Bag of Visual Words” model and are extensively used in computer vision. They went slightly out of vogue with the advent of deep learning but now they are back in favour with a slight twist.  That twist is that  instead of generating fisher  vectors on the raw images, they are generated from a layer in the neural network!

Nazr-CNN

Nazr-CNN is simply an integrated deep learning pipeline consisting of two parts. The first part takes raw images and segments them. When applied to a single image the output is a set of group of pixels. Each group represents a house or background. For semantic segmentation we use DeepLab.  The second part takes the segments through another deep learning pipeline and extract fisher vectors from an intermediate layer of the neural network. The fisher vectors are then used to train a support vector machine (a twist within a twist!) which embodies the final classifier. We use FV-CNN for texture classification.

Results

ss_fisher

Overall, we are all super excited for Nazr-CNN.  Please let us know if you have any questions or comments. Currently, we are working on the pipeline. Pleas stay tuned for more update!

Thank you,

MicroMappers Team