Mentoring Undergrad Research Across Continents: An Experiment

19
Jun
2010

From July 2010 to June 2011, I will be mentoring 8 undergrad students from my alma mater, Fr. Conçeicao Rodrigues College of Engineering for their Senior Projects (a.k.a. Final-Year projects). We plan to work together on two projects in the area of Personal Information Management. This is an experiment of sorts, because as far as I know, final-year projects in Bombay University have never been mentored remotely, and only a small proportion are research-oriented.

Why it’s the right time for this.

There are several reasons why I’m doing this: several years ago, a few of my friends from undergrad — Salil Wadhavkar, Ninad Pradhan, Vikram Iyengar, Noel Tide, Rahul Saxena, Raghu Cowlagi — had discussed mentoring our juniors to participate in tech contests. At the time, we all had just finished our Masters degrees, and felt that we could nudge a few students to take up research and grad school by such encouragement. That conversation died down for several reasons about 4-5 years ago, but the spark remained.

I enjoy building things. That’s why I had decided I wanted to be in industry (instead of academia) even before I started my Ph.D. program. But it’s also fun to conduct studies and find answers to interesting questions, and it’s incredibly hard to do the latter while my job at Google enables me to build Awesome Stuff™ full-time. Collaborating with students seems like a win-win situation for all of us: the undergrad students gain exposure to research, and we all are able to build, study, and publish what we find.

Proposing the collaboration

With these ideas in mind, I visited my undergrad college (under Bombay University) when I was in Bombay this February. I met the Principal, Dr. Srija Unnikrishnan, and the Head of the Department of Information Technology, Prof. Mahesh Sharma. Both found the idea promising, and Prof Sharma asked me to address the 3rd year classes (junior-level in the US system) that were in progress at the time. Although I felt a little bad interrupting classes to deliver my 15-min spiel, the students, professors, head of the department and the principal were not only supportive but also enthusiastic about this. My concerns about the distance & my lack of physical availability in Bombay were brushed aside by Prof. Sharma (“when there’s chat and Skype, why do you have to be here personally?”)

Soon, two groups of four students contacted me, and we worked on defining the projects between February and now (June 2010). I was impressed by their initial emails, which clearly showed they were not only interested, but had also done their homework before proposing a project. Akash Singh, Abhishek Mishra, Shivam Mishra, Nitish Nadkarni, and I will be collaborating on an email-related project, and Rushabh Ajmera, Aneesh Datar, Bhavya Gandhi, Vimarsh Karbhari, and I will be collaborating on a task-management related project; both within Personal Information Management. (We will publish the details of these projects and the entire source code developed as part of this project as and when we have something to report.)

Other voices

Luis von Ahn from CMU recently published a blog post about outsourcing his research group. While he proposes doing this mainly for monetary reasons, I figured this could be more of an academically-enriching mutually-beneficial experience. There are a few neat opinions expressed in his blog post, as well as in the comments.

Next steps

I expect to learn many valuable lessons about cross-continental collaboration from this process as much as I expect to learn from my new colleagues — who are no doubt better versed in technologies of the day than I am. I will continue to blog about our experiences as we proceed.

If you have any tips for us as we embark on this year-long experiment, please leave us a comment.

The 5 Stages of Driving in India

Permanent Link | Filed under: Thoughts
17
Feb
2010

(with apologies to Elisabeth Kübler-Ross)

  1. Denial: No way that guy’s gonna cut across in front of me.
  2. Anger: Whaa? WTF? Get out of my frikkin’ way!
  3. Bargaining: Maybe if I let him cut across, I could still retain a modicum of sanity.
  4. Depression: Screw this, it’s never gonna get any better.
  5. Acceptance: Oh well, when in Rome …

Simplified Twitter Microsyntax for the Haiti Earthquake

18
Jan
2010

In this post, I have typeset many more sentences in bold than I usually do, so readers can quickly skim through it.

I applaud the efforts of U. Colorado’s EPIC Group in assisting the victims of the Haiti earthquake in calling for help using Twitter, and to make their tweets discoverable and actionable. I just performed a Twitter search for the terms #haiti -RT -http (includes all Tweets tagged #Haiti, except retweets or links) to inspect some of the tweets that are directly related to happenings on the ground, and they are (as expected) only a minuscule percentage of the total number of tweets about #Haiti. Syntax is thus sorely needed to achieve a decent signal-to-noise ratio to assist relief efforts.

Though, in my opinion, the current version of the tweet syntax seems too formal, too rigid and a tad too complicated for victims or rescuers on the ground. I am a programmer, and even I had trouble mentally parsing a few of the examples provided. We must keep in mind that Haiti is a bi-/tri-lingual country (and neither of them is English), so any syntactic terms used should preferably be semi-obvious to non-native speakers of the language as well as rescuers.

Roles of Microsyntax

  1. Make tweets discoverable: Microsyntax can assist local search-and-rescue efforts and unaffected Twitter users in determining if a tweet is actionable. This task is partly a Signal Detection Task and partly a Data Mining problem. In both situations, microsyntax can prove helpful: all that’s needed is a single tag that emphasizes that a particular tweet is actionable (versus not), e.g. #haiti #rescue (or #haitirescue, to avoid having to type a second # (hash) sign). This will greatly increase the sensitivity parameter d’ of the signal detection task.
  2. Make data mining easier: Once a tweet has been detected to be actionable, its contents must be parsed into a form that local efforts can take action upon. While it’s true that all the other proposed microsyntactic tags make it easier for applications to parse the data, this is at the cost of requiring users to learn new syntax. This seems to me a little too much to expect from victims of a recent calamity of this scale as well as from rescue workers with other higher priorities. Instead, as long as our tools can identify relevant tweets, computers should be able to perform the second task of parsing locations, names, and verbs from tags quite easily.

Also, microsyntactic terms need not always be prefixed with # (hash) signs; they are often difficult to type using cell phone keyboards, and on some handsets, may hamper input methods such as T9. Because of the intervening # signs, Tweets containing the proposed microsyntax decrease typographic readability for someone browsing through tweets.

To summarize, this imposes a heavy cognitive load on victims and search-and-rescue efforts while making parsing easier for machines. However, the task of parsing details from tweets can also easily be performed by large numbers of humans a.k.a. crowdsourcing via volunteer efforts or via tools such as Amazon’s Mechanical Turk.

Simpler, Lighter Microsyntax

The following are examples of microsyntax that are more readable, yet also parseable by machines. All situations are based on the ones in the original proposed microsyntax. Most are directly based on the EPIC microsyntax, with a few simplifications.

  • Rule 1: Always write in the third-person. This takes care of part of the name problem.
  • Rule 2: Instead of using #loc for locations, use “at”. It’s much more natural and not much more difficult to parse.
  • Rule 3: Verbs are actionable. Not syntactic verbs, but English (or French or Haitian Creole) verbs. It’s a trivial task to populate a tool with a dictionary to detect all word forms correctly.
  • Rule 4: Anything that cannot be parsed ends up as the equivalent of the #info tag (see EPIC syntax).
  • Rule 5: The entire text of the tweet should always be available to a human, so whatever information was incompletely parsed can be understood manually, and optionally added to the parsed version by a human.

The general aim is to require as little syntax knowledge as possible, and to keep as close as possible to the natural way people write tweets.

Examples

TWEET-BEFORE: Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter a 3 story schol building
TWEET-AFTER: #haiti #ruok #name Sherline Birotte aka Memen. Last seen #loc 19 Ruelle Riviere College University of Porter #info a 3 story schol building
Simplified Microsyntax: #haiti #rescue Looking for Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter, a 3 story school building

This tells the computer us:
What = Looking for someone.
Who = Sherline Birotte aka Memen (identified fuzzily based on initial capital letters)
Where = 19 Ruelle Riviere College University of Porter (automatically parsed based on “at”)
What else = “a 3 story schol building” (i.e. everything else in the tweet)

TWEET-BEFORE: Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying.
TWEET-AFTER: #haiti #need #food #name Mirna Nazaire lives in #loc PAP at Bizoton 6 #12 #info neighborhood w/o food. People dying
Simplified Microsyntax: #haiti #rescue Mirna Nazaire at PAP at Bizoton 6#12 needs food. Entire neighborhood without food. People dying.

This tells us:
What = needs food. (automatically detected from the verb in the sentence.)
What do they need = food (automatically detected from the object in the sentence.)
Who = Mirna Nazaire (heuristically determined from initial capital letters.)
Where = PAP at Bizoton 6 #12 (detected from microsyntax “at”)
What else = “neighborhood w/o food. People dying.” (Rest of the tweet, unfiltered.)

TWEET-BEFORE: French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon
TWEET-AFTER: #haiti #offering #med #loc french lycee in rue marcadieux bourdon #num 30+ #info French hospital is open and ready 2 receive wounded
Simplified Microsyntax: #haiti #rescue French hospital ready to offer help to 30+ wounded at the french lycee in rue marcadieux bourdon

This tells us:
What: Hospital. Also, something to do with medical efforts. (no need to tag explicitly, we can infer that from ‘hospital’.)
Where: The french lycee in rue marcadieux bourdon. (Automatically parsed from microsyntax “at”.)
How many people: 30+. (It’s already a number, no need to state “#num” explicitly.)

These are just a few suggestions. I will be contacting the PIs (principal investigators) of the EPIC project directly with some of my recommendations, but please continue to follow their syntax until they recommend anything different. The current syntax proposal isn’t perfect, but it is more important to avoid fragmenting the tagspace.

Bookmark and share using ...

Delicious Facebook Digg Google Friendfeed Stumbleupon Twitter Linked In