Simplified Twitter Microsyntax for the Haiti Earthquake
In this post, I have typeset many more sentences in bold than I usually do, so readers can quickly skim through it.
I applaud the efforts of U. Colorado’s EPIC Group in assisting the victims of the Haiti earthquake in calling for help using Twitter, and to make their tweets discoverable and actionable. I just performed a Twitter search for the terms #haiti -RT -http (includes all Tweets tagged #Haiti, except retweets or links) to inspect some of the tweets that are directly related to happenings on the ground, and they are (as expected) only a minuscule percentage of the total number of tweets about #Haiti. Syntax is thus sorely needed to achieve a decent signal-to-noise ratio to assist relief efforts.
Though, in my opinion, the current version of the tweet syntax seems too formal, too rigid and a tad too complicated for victims or rescuers on the ground. I am a programmer, and even I had trouble mentally parsing a few of the examples provided. We must keep in mind that Haiti is a bi-/tri-lingual country (and neither of them is English), so any syntactic terms used should preferably be semi-obvious to non-native speakers of the language as well as rescuers.
Roles of Microsyntax
- Make tweets discoverable: Microsyntax can assist local search-and-rescue efforts and unaffected Twitter users in determining if a tweet is actionable. This task is partly a Signal Detection Task and partly a Data Mining problem. In both situations, microsyntax can prove helpful: all that’s needed is a single tag that emphasizes that a particular tweet is actionable (versus not), e.g. #haiti #rescue (or #haitirescue, to avoid having to type a second # (hash) sign). This will greatly increase the sensitivity parameter d’ of the signal detection task.
- Make data mining easier: Once a tweet has been detected to be actionable, its contents must be parsed into a form that local efforts can take action upon. While it’s true that all the other proposed microsyntactic tags make it easier for applications to parse the data, this is at the cost of requiring users to learn new syntax. This seems to me a little too much to expect from victims of a recent calamity of this scale as well as from rescue workers with other higher priorities. Instead, as long as our tools can identify relevant tweets, computers should be able to perform the second task of parsing locations, names, and verbs from tags quite easily.
Also, microsyntactic terms need not always be prefixed with # (hash) signs; they are often difficult to type using cell phone keyboards, and on some handsets, may hamper input methods such as T9. Because of the intervening # signs, Tweets containing the proposed microsyntax decrease typographic readability for someone browsing through tweets.
To summarize, this imposes a heavy cognitive load on victims and search-and-rescue efforts while making parsing easier for machines. However, the task of parsing details from tweets can also easily be performed by large numbers of humans a.k.a. crowdsourcing via volunteer efforts or via tools such as Amazon’s Mechanical Turk.
Simpler, Lighter Microsyntax
The following are examples of microsyntax that are more readable, yet also parseable by machines. All situations are based on the ones in the original proposed microsyntax. Most are directly based on the EPIC microsyntax, with a few simplifications.
- Rule 1: Always write in the third-person. This takes care of part of the name problem.
- Rule 2: Instead of using #loc for locations, use “at”. It’s much more natural and not much more difficult to parse.
- Rule 3: Verbs are actionable. Not syntactic verbs, but English (or French or Haitian Creole) verbs. It’s a trivial task to populate a tool with a dictionary to detect all word forms correctly.
- Rule 4: Anything that cannot be parsed ends up as the equivalent of the #info tag (see EPIC syntax).
- Rule 5: The entire text of the tweet should always be available to a human, so whatever information was incompletely parsed can be understood manually, and optionally added to the parsed version by a human.
The general aim is to require as little syntax knowledge as possible, and to keep as close as possible to the natural way people write tweets.
Examples
TWEET-BEFORE: Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter a 3 story schol building
TWEET-AFTER: #haiti #ruok #name Sherline Birotte aka Memen. Last seen #loc 19 Ruelle Riviere College University of Porter #info a 3 story schol building
Simplified Microsyntax: #haiti #rescue Looking for Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter, a 3 story school building
This tells the computer us:
What = Looking for someone.
Who = Sherline Birotte aka Memen (identified fuzzily based on initial capital letters)
Where = 19 Ruelle Riviere College University of Porter (automatically parsed based on “at”)
What else = “a 3 story schol building” (i.e. everything else in the tweet)
TWEET-BEFORE: Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying.
TWEET-AFTER: #haiti #need #food #name Mirna Nazaire lives in #loc PAP at Bizoton 6 #12 #info neighborhood w/o food. People dying
Simplified Microsyntax: #haiti #rescue Mirna Nazaire at PAP at Bizoton 6#12 needs food. Entire neighborhood without food. People dying.
This tells us:
What = needs food. (automatically detected from the verb in the sentence.)
What do they need = food (automatically detected from the object in the sentence.)
Who = Mirna Nazaire (heuristically determined from initial capital letters.)
Where = PAP at Bizoton 6 #12 (detected from microsyntax “at”)
What else = “neighborhood w/o food. People dying.” (Rest of the tweet, unfiltered.)
TWEET-BEFORE: French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon
TWEET-AFTER: #haiti #offering #med #loc french lycee in rue marcadieux bourdon #num 30+ #info French hospital is open and ready 2 receive wounded
Simplified Microsyntax: #haiti #rescue French hospital ready to offer help to 30+ wounded at the french lycee in rue marcadieux bourdon
This tells us:
What: Hospital. Also, something to do with medical efforts. (no need to tag explicitly, we can infer that from ‘hospital’.)
Where: The french lycee in rue marcadieux bourdon. (Automatically parsed from microsyntax “at”.)
How many people: 30+. (It’s already a number, no need to state “#num” explicitly.)
These are just a few suggestions. I will be contacting the PIs (principal investigators) of the EPIC project directly with some of my recommendations, but please continue to follow their syntax until they recommend anything different. The current syntax proposal isn’t perfect, but it is more important to avoid fragmenting the tagspace.






