Mentoring Undergrad Research Across Continents: An Experiment

From July 2010 to June 2011, I will be mentoring 8 undergrad students from my alma mater, Fr. Conçeicao Rodrigues College of Engineering for their Senior Projects (a.k.a. Final-Year projects). We plan to work together on two projects in the area of Personal Information Management. This is an experiment of sorts, because as far as I know, final-year projects in Bombay University have never been mentored remotely, and only a small proportion are research-oriented.

Why it’s the right time for this.

There are several reasons why I’m doing this: several years ago, a few of my friends from undergrad — Salil Wadhavkar, Ninad Pradhan, Vikram Iyengar, Noel Tide, Rahul Saxena, Raghu Cowlagi — had discussed mentoring our juniors to participate in tech contests. At the time, we all had just finished our Masters degrees, and felt that we could nudge a few students to take up research and grad school by such encouragement. That conversation died down for several reasons about 4-5 years ago, but the spark remained.

I enjoy building things. That’s why I had decided I wanted to be in industry (instead of academia) even before I started my Ph.D. program. But it’s also fun to conduct studies and find answers to interesting questions, and it’s incredibly hard to do the latter while my job at Google enables me to build Awesome Stuff™ full-time. Collaborating with students seems like a win-win situation for all of us: the undergrad students gain exposure to research, and we all are able to build, study, and publish what we find.

Proposing the collaboration

With these ideas in mind, I visited my undergrad college (under Bombay University) when I was in Bombay this February. I met the Principal, Dr. Srija Unnikrishnan, and the Head of the Department of Information Technology, Prof. Mahesh Sharma. Both found the idea promising, and Prof Sharma asked me to address the 3rd year classes (junior-level in the US system) that were in progress at the time. Although I felt a little bad interrupting classes to deliver my 15-min spiel, the students, professors, head of the department and the principal were not only supportive but also enthusiastic about this. My concerns about the distance & my lack of physical availability in Bombay were brushed aside by Prof. Sharma (“when there’s chat and Skype, why do you have to be here personally?”)

Soon, two groups of four students contacted me, and we worked on defining the projects between February and now (June 2010). I was impressed by their initial emails, which clearly showed they were not only interested, but had also done their homework before proposing a project. I will be collaborating with one team on an email-related project, and on a task-management related project with another; both within Personal Information Management. (We will publish the details of these projects and the entire source code developed as part of this project as and when we have something to report.)

Other voices

Luis von Ahn from CMU recently published a blog post about outsourcing his research group. While he proposes doing this mainly for monetary reasons, I figured this could be more of an academically-enriching mutually-beneficial experience. There are a few neat opinions expressed in his blog post, as well as in the comments.

Next steps

I expect to learn many valuable lessons about cross-continental collaboration from this process as much as I expect to learn from my new colleagues — who are no doubt better versed in technologies of the day than I am. I will continue to blog about our experiences as we proceed.

If you have any tips for us as we embark on this year-long experiment, please leave us a comment.

ReTweeting: Attribution for Discovery versus Attribution for Creation

During the past few months, I have found myself consuming more news and articles via recommendations from friends and those I follow on Twitter than via traditional source-based subscription (e.g. subscribing to specific feeds or newspapers). Social media discovery is here, and the best part of reTweeted links is that they have already gone through a round of peer review by peers I trust.

Often, I’m tempted to reTweet that content myself, or post it to Facebook, or share it via Google Reader. A few of these media keep attribution intact (e.g. Google Reader adds the “Shared by” metadata for each person in the chain that shared the content.) Others such as Twitter are restricted by the length of the post, so the “RT @” list quickly gets too long and inevitably gets trimmed along the way.

But there’s no accepted practice for how this list should be trimmed. Should you keep the first Tweeter, even if that person is not the author of the content? (E.g. someone who read an NY Times article and tweeted about it.) Should you keep the last reTweeter, who was your direct link to the content in question? What about multiple Tweeters re-posting links to the same content, so it’s not a tree any more, but a forest of links (imagine a directed graph with edges denoting “shared by X to Y”).

The problem is that by including attribution about the process of discovery, we end up attaching higher value to discovery than creation. When someone reTweets a secondary source of information, attribution for the primary source is often trimmed away. This is especially bad for Creative Commons works that require attribution when re-posted, but is bad in general for any kind of work and for authors of that work.

I have come to the conclusion that although attribution for discovery is important, it’s hard to apply consistently in fixed-character-length media. It’s a completely different story in case of original content generated by the tweeter himself/herself: e.g. one-liners, or authors tweeting links to their (longer) content. Attribution for original content is vastly more meaningful than attribution for promoting someone else’s content (although the value of that act is substantial as well.)

So from now on, I will only attribute original content in my tweets and Facebook updates. My intention is not to discount the value of the source that shared the content with me, but instead to promote the original author of that content wherever possible.

HOWTO Use custom DNS redirects to save browser keystrokes

Given the recent interest in DNS and its role in the public infrastructure of the Internet, sparked by the release of Google Public DNS, here’s a hack that can help you save keystrokes in the browser while accessing your favorite sites. Instead of typing in “youtube.com” or “twitter.com”, you can just type “y” or “t”. If you’re looking for a map of San Francisco, CA, you can type “map/sf” and jump to the right place in Google Maps.

A bright bold blinking marquee disclaimer before we start: this is advanced territory. If you don’t know what sudo is and why 127.0.0.1 is special, be careful following these instructions because you may unintentionally destroy your ability to do anything at all on the Internet — including looking up instructions for getting unstuck. Also, these instructions only apply to Mac OS X and Linux, or other UNIX variants.

Redirect custom DNS hostnames to frequently-accessed sites

The file /etc/hosts on your machine is consulted by the DNS resolver before making a request to a DNS server. The idea is to add new DNS entries to the hosts file on your machine, pointing short domains such as g and t to 127.0.0.1. Now, whenever you type g or t into your browser, the hostname will be matched from your /etc/hosts file, instead of receiving an NXDOMAIN reply (i.e., this domain does not exist) from an upstream DNS provider. Since this request is received by your own machine, you can then handle it to do whatever you want, including, but not limited to, redirecting the user to the intended destination.

This HOWTO assumes that Apache is installed and running on your system with PHP and mod_rewrite support.

Modify /etc/hosts

Open /etc/hosts in your favorite text editor, and add one line for each shortcut you’d like to set up. Leave everything else unchanged. (You will need to sudo edit this file.)

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1 localhost
127.0.0.1 c # for Calendar
127.0.0.1 f # for Facebook
127.0.0.1 g # for Google Search
127.0.0.1 m # for Mail
127.0.0.1 map # for Maps
127.0.0.1 t # for Twitter
127.0.0.1 w # for Wikipedia
127.0.0.1 y # for Yelp
127.0.0.1 yo # for YouTube

255.255.255.255	broadcasthost
::1             localhost
fe80::1%lo0	localhost

You can test that this change worked, by typing in the address (e.g. http://g/ in your browser. Instead of seeing a page that says that your browser “can’t find the server ‘g’”, now you would see a page saying that your server isn’t configured correctly, or welcome to Apache, or whatever you would see if you typed http://localhost/ instead. If that worked, proceed.

Configure Apache to handle requests for unknown domains/URIs

Edit the following lines in /etc/apache2/httpd.conf. The following code shows an excerpt with lots of context around the line you need to edit. Locate the relevant section in your file.


#
# This should be changed to whatever you set DocumentRoot to.
#
<Directory "/Library/WebServer/Documents">
    #
    # Possible values for the Options directive are "None", "All",
    # or any combination of:
    #   Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
    #
    # Note that "MultiViews" must be named *explicitly* --- "Options All"
    # doesn't give it to you.
    #
    # The Options directive is both complicated and important.  Please see
    # http://httpd.apache.org/docs/2.2/mod/core.html#options
    # for more information.
    #
    Options +Indexes +FollowSymLinks +MultiViews

    #
    # AllowOverride controls what directives may be placed in .htaccess files.
    # It can be "All", "None", or any combination of the keywords:
    #   Options FileInfo AuthConfig Limit
    #
    AllowOverride All # <-- Change this from None to All

    #
    # Controls who can get stuff from this server.
    #
    Order allow,deny
    Allow from all

</Directory>

Locate your Apache root directory. It’s usually /Library/WebServer/Documents on the Mac or /var/www in Ubuntu. If you’re unsure, check where it is by issuing the following command in a terminal: (assuming you’re running Apache 2.x)

grep “DocumentRoot” /etc/apache2/httpd.conf

In that directory, save the following file. It should be named exactly .htaccess. (That’s htaccess with a period at the beginning, so it’s a hidden file on UNIX.) Save it as /Library/WebServer/Documents/.htaccess on Mac OS X or /var/www/.htaccess on Ubuntu.

<IfModule mod_rewrite.c>

RewriteEngine on
RewriteBase /

RewriteCond    %{REQUEST_FILENAME} !-f
RewriteCond    %{REQUEST_FILENAME} !-d
RewriteRule    (.*) /index.php [L]

</IfModule>

The actual redirection script

Here’s the script that I use for redirection, but you can roll out your own, and do anything with each request you receive. (If you do something phenomenally awesome, I’d love to hear about it in your comments.) As you can see, it’s customized to the sites I frequent, including location preferences (e.g. the Yelp shortcut takes me to Yelp San Francisco directly. The search box is preconfigured for SF.)

Save this as /Library/WebServer/Documents/index.php (on Mac OS X) or as /var/www/index.php on Ubuntu.

<?php
  $uri = preg_replace('/^\//', '', $_SERVER['REQUEST_URI']);
  switch($_SERVER['SERVER_NAME']) {
    case 'c':
      redir('http://calendar.google.com/');
      break;
    case 'f':
      redir('http://facebook.com/');
      break;
    case 'g':
      redir('http://www.google.com/search?q=' . $uri);
      break;
    case 'm':
      redir('http://mail.google.com/');
      break;
    case 'map':
      redir('http://maps.google.com/?q=' . $uri);
      break;
    case 't':
      redir('http://twitter.com/');
      break;
    case 'w':
      if ('' === $uri) {
        redir('http://en.wikipedia.org/wiki/');
      } else {
        redir('http://en.wikipedia.org/wiki/Special:Search/' . $uri);
      }
      break;
    case 'y':
      if ('' === $uri) {
        redir('http://yelp.com/sf/');
      } else {
        redir('http://yelp.com/search?ns=1&find_loc=San%20Francisco,%20CA&find_desc=' . $uri);
      }
      break;
    case 'yo':
      if ('' === $uri) {
        redir('http://www.youtube.com/');
      } else {
        redir('http://www.youtube.com/results?search_query=' . $uri);
      }
      break;
  }

  function redir($url) {
    header('Location: ' . $url);
  }
?>

That’s it, now type your shortcuts into your browser instead of the longer URLs, and there you are. If you run into trouble, leave a comment and I’ll address it.

The only downside of this approach

Redirecting involves an additional HTTP request to your machine, which introduces additional latency. The request, however, is from your machine to your machine itself, so there’s no network involved. Personally, I feel that the keystrokes saved by the technique would have taken longer to type than the shortcuts I set via this method. But you don’t lose anything if you set this up and don’t use it — just continue to type entire URLs and you will never pay a latency penalty.

One-button Phone Number Sharing

Send this Phone Number to the Current Caller

How often have you found yourself calling a friend to get the phone number of a mutual friend? And then having to hold the phone while your friend pulls up the contact list on their phone, then recites the number to you, and then you write it on paper because your phone won’t let you add contacts while you’re on a call, and then you misplace the number you wrote on paper, ad nauseum. Why isn’t there a single button that says “Send this Phone Number to the Current Caller”?

It’s a common problem. You’re out and about, and realize you need to call a specific person, but you don’t have their phone number (or more often, you have it on your desktop computer, or your laptop, but that doesn’t do you any good in the current situation.) So you decide that the best thing to do is to call a mutual friend and ask them.

When they receive a phone call from you, they’re fumbling to hold the call while they look in their address book. (That is, if they’re lucky, and if their phone actually lets them open the contact list while they’re on a call.) More often, what happens is that they tell you to hang up while they consult their address book. And then you have to hunt for a piece of scrap paper because your phone won’t let you add a number to the list like that.

What the world needs is a button next to each phone number in the contact list that only appears whenever you’re on a call. The button, when pressed, sends an SMS from you to the current caller, and contains within it the information from the contact record you just selected. It doesn’t have to be too fancy, a two-line VCF record should do nicely.

If the recipient’s phone understands this method of contact transfer, it can prompt the user and import it automatically. If not, the user can still read the SMS herself, and dial the number. No more paper, no more fumbling, no more “let me call you back”.

It’s so easy, a caveman could do it. If only phones implemented it!

Book-as-Blog: Encouraging Reading by Posting a Chapter at a Time

I realized I haven’t picked up a book in weeks, (non-academic book, that is), but I’ve read more than my fair share of blogs in that same time. I wonder if part of the reason is the longer time commitment required by a book. This prevents it from being read quickly and keeps it forever on my wish list. If so, then how about a service that breaks down books into blog-post-sized chunks and publishes them every few days?

The idea is inspired by, — nay, stolen from — Kevin Kelly, who is reissuing his 10-yr old book as a blog (hat-tip to Seth Godin’s post on the topic). His reasons are different, though. The book is out-of-print, and is already available as a downloadable PDF from his web site. Making it available as a blog is just another way of spreading his ideas wider, which is a great idea.

But apart from that, I like the idea of chopping up a book into chapter-sized chunks and making them available to readers one at a time. Not for any economic reasons, but because attentional resources are so scarce these days. A few times during the day, I have some free time which I use to read a few blog posts. If I ever thought about picking up a book during these breaks, I wouldn’t do it, simply because of the (arguably artificial) time commitment issues it raises in my mind. But talk about a chapter-sized, or even smaller blog post, and I’d read it.

Of course, not all book content has an affordance for this kind of splicing and dicing. If it takes several minutes for a reader to re-establish context from the last blog post, the purpose is lost. Some authors would consider their books a work of art too precious(ssss) to split it up into anything smaller. That’s also the reason why bands are often reluctant to sell singles instead of entire albums (apart from the record labels preferring to sell you 9 lame tracks bundled with 1 great track for $10 instead of $1, thank you very much.) But several non-fiction books could verily adapt to such a format.

The book-as-blog need not be free (as in no charge.) Sure, charge me for it. Implementation would be easy, charge me a micropayment and give me a secret watermarked feed URL. With so much new content licensed under a Creative Commons attribution license, it’s also possible to develop a web service that does this for liberally-licensed and public domain works. This is compatible with Creative Commons Attribution (BY), Attribution-ShareAlike (BY-SA), Attribution-Noncommercial (BY-NC), and Attribution Non-commercial Share-Alike (BY-NC-SA) licenses (but I’m not a lawyer, this is not legal advice, blah blah.)

Maybe something like this will finally get me back to the several-books-a-month club I used to be a member of, until I discovered this newfangled shiny thing called the Internet.

Next Page »