We are still improving the Nazr-CNN Algorithms. please stay tuned!
We are still improving the Nazr-CNN Algorithms. please stay tuned!
Hurricane Harvey was the first major hurricane to make landfall in the United States since Wilma in 2005, ending a record 12-year period with no major hurricanes making landfall in the country. In a four-day period, many areas received more than 40 inches (1,000 mm) of rain as the system meandered over eastern Texas and adjacent waters, causing catastrophic flooding. The resulting floods inundated hundreds of thousands of homes, displaced more than 30,000 people, and prompted more than 17,000 rescues.
Qatar Computing Research Institute(QCRI) has been working on machine learning for UAV(unmanned aerial vehicle – drone) imagery using deep learning algorithms. For Hurricane Harvey, we are able to run our algorithm. Please see the below result. And, here is our report that explains how this works.
We are still looking for more data for improving our model so that we can share with the community.
We are happy to announce that our paper “Fine Grained Classification of UAV Imagery for Damage Assessment” will be presented at DSAA2017 – The 4th IEEE International Conference on Data Science and Advanced Analytics
We really appreciate your help, especially, Standby Task Force volunteers & Digital Jedis for this research project. This paper is a very special milestone in the UAV research community because it is the first UAV Imagery paper using deep learning algorithms. We are working hard for next research paper based on Philippines expedition dataset.
Update: Completed the activation.
Thank you for your help! We have completed the activation. These dataset will be used to enhance the computer vision model. Please see the below for the Digital Jedis final activity map.
Update: Our volunteers’s activity Map
We need your help!
We are launching now the MicroMappers damage assessment expedition to the Philippines! The purpose of this deployment is to develop & delivery the cutting-edge machine learning algorithm. It will be used to automatically detect and categorize damaged infrastructure in UAV images taken in the aftermath of natural disaster. Your help is essential of this new deep learning machine learning algorithm. The algorithm will be released to public as a final production for humanitarian purpose.
To start your digital humanitarian efforts, simply click on the link below:
Thanks for your help in supporting these important disaster relief efforts!
Your MicroMappers Team,
Early of this year, we have introduced MicroMappers new face, MicroMappers Hub. Since the MicroMappers Hub release, we have been releasing new features slowly. The current MicroMappers key features are;
Please see the below for GDELT Image Classifier features in MicroMappers.
Once you login, please see “Disaster Global Events” box. Then, click “Image Classifiers”
Now, you can see the current image classifier list. If you want to define your own, please click “Request New Image Classifier”. It will redirect to the configuration page. If you click on “View Map”, you can see the map with images. see the below.
This is Image Classifier configuration page. Basically, you need to fill out the form.
As you can see, amazing features are here. If you are not sure how to start, please check “Tutorial” first.
We want to hear your experience and needs. Please visit the MicroMappers Hub, give us feedback.
Named Entity Recognition (NER) involves identifying named entities such as persons, locations, and organizations in text. NER is essential for a variety of Natural Language Processing (NLP), Information Retrieval (IR), and Social Computing (SC) applications. In this blog, I present QCRI’s state-of-the-art Arabic microblogs NER system.
Microblog NER Challenges
NER on microblogs faces many challenges such as:
(1) Microblogs are often characterized by informality of language, ubiquity of spelling mistakes, and the presence of Twitter name mentions (ex. @someone), hashtags, and URL’s;
(2) NE’s are often abbreviated. For example, tweeps (tweet authors) may write “Real Madrid” as just “the Real”;
(3) Tweeps often use brief and choppy expressions and incomplete sentences;
(4) Word senses in tweets may differ than word senses in news. For example, “mary jane” in tweets likely refers to Marijuana as opposed to a person’s name;
(5) Tweeps may inconsistently use capitalization (for English), where capitalized words may not capitalized and ALL CAP words are used for emphasis; and
(6) We observed that NE’s often appear in the beginning or the end of tweets and they are often abbreviated.
As for Arabic microblogs, they exhibit more complications, namely:
Dialects introduce morphological variations with different preﬁxes and sufﬁxes. For example, Egyptian and Levantine tend to insert the letter ب (sounds like “ba”) before verbs in present tense.
Most work on NER relies on using a sequence labeler, such as a Conditional Random Fields (CRF) labeler, that relies on a variety of contextual features and gazetteers, which are large lists of named entities. Our state-of-the-art NER system enhances on the same path by presenting novel ways of building larger gazetteers, applying domain adaptation, using semi-supervised training, performing transliteration mining, and employing cross-lingual English-Arabic resources such as Wikipedia. We train a CRF sequence labeler with these enhancements.
Using Arabic Wikipedia
Since building larger gazetteers can positively impact NER, we used Wikipedia to build large gazetteers. To do so, we filtered category names to filter Wikipedia titles that would constitute names of persons, locations, and organizations. Here are sample words (translated into English) that we used for filtering:
We also used page redirects (alternative page names) to expand the gazetteers. The resultant gazetteer had 70,908 locations, 26,391 organizations, and 81,880 persons.
DBpedia is a large collaboratively-built knowledge base in which structured information is extracted from Wikipedia, and it contains 6,157,591 Wikipedia titles belonging to 296 types. Types vary in granularity with each Wikipedia title having one or more type. For example, NASA is assigned the following types: Agent, Organization, and Government Agency. In all, DBpedia includes the names of 764k persons, 573k locations, and 192k organizations. Of the Arabic Wikipedia titles, 254,145 have Wikipedia cross-lingual links to English Wikipedia, and of those English Wikipedia titles, 185,531 have entries in DBpedia. We used the DBpedia types as features for the NER system.
As I mentioned earlier, Arabic lacks capitalization and Arabic names are often common Arabic words. For example, the Arabic name “Hasan” means good. To capture cross-lingual capitalization, we used a machine translation phrase table that was built using large amounts of parallel Arabic-English text and where the case was not folded on the English side. Then given an Arabic word, we would look up its English translation and observe the likelihood that the English translation is capitalized.
Many named entities, particularly persons and locations, are often transliterated. We would lookup the translations of Arabic words in the aforementioned phrase table and then we determined using an in-house transliteration miner whether the English and Arabic translations are also transliterations or not. If they are, then we used the transliteration probability as a feature.
Using Domain Adaptation:
Aside from tagging microblog text with named entities, we mixed tagged news texts with tagged microblog text to make use of the large news training data.
Basically, we used our best NER system to tag a large corpus of microblogs. Our intuition was that if we automatically tag a large set of tweets, then a NE may be tagged correctly multiple times. Then, automatically identiﬁed NE’s can then be used as a “new gazetteer.”
How Good is our NER System
The QCRI NER system is considered state-of-the-art for Arabic microblogs. Table 1 reports on the evaluation results for the NER system. We performed the evaluation on a set of 1,423 tweets containing nearly 26k tokens. The tweets were randomly selected from the period of Nov. 23-27, 2011.
Table 1. NER results
Obtaining and Citing our NER System
The system is a part of the Farasa Arabic processing toolkit and is available under research license from: http://qatsdemo.cloudapp.net/farasa/ . It is written entirely in Java and can be invoked as a stand-alone executable or through API. Usage example are available in: http://qatsdemo.cloudapp.net/farasa/usage.html .
For a detailed description of the system, please refer to the following two papers:
Kareem Darwish. 2013. Named Entity Recognition using Cross-lingual Resources: Arabic as an Example. ACL-2013.
Kareem Darwish, Wei Gao. 2014. Simple Effective Microblog Named Entity Recognition: Arabic as an Example. LREC-2014.