My Research Philosophy

8 Feb, 2009 — Academic, Thoughts

I wrote this recently, not as a blog post, but for another purpose. I figured I’d post it here like I do everything else.

Re•search: noun. Investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts, or practical application of such new or revised theories or laws.

—Merriam-Webster Dictionary.

The last part of that definition has always been the chief motivator for me in my research — practical application. While all research seeks to discover universal truths and deeper meaning, I strongly believe that researchers have a responsibility to contribute to society in other tangible ways as well.

Just as Gutenberg’s invention of the printing press in 1439 made literary works accessible to everyone, the Internet is likewise speeding up the propagation of knowledge now and will continue to do so in the decades to come. We are on the brink of a cultural revolution where ideas, prototypes, discussion and research know no boundaries of location or time. The Free Software Movement is promoting users’ freedom to understand and explore computer programs. The Creative Commons project encourages authors, scientists, artists and educators to distribute their creations under licenses that foster the sharing of ideas, encourage discussion, engender a culture of openness, and speed up innovation. This provides enormous opportunities for researchers to collaborate in real-time across institutions, countries and continents, and to serve the community by disseminating their research results via public blogs, videos, slides, prototypes and designs.

As a researcher in Human-Computer Interaction at the Dept. of Computer Science at Virginia Tech, I have developed several tools and prototypes that would be of benefit not just to researchers but also to computer users. It is by studying their habits that I designed these tools — to them, I owe these tools. I work in the area of Personal Information Management (PIM), and study how users access and manage information such as files, calendars, email messages, contacts and bookmarks on multiple devices. I release all such tools and software to the world at my web site under licenses that permit anyone to inspect the source code, build upon it, and benefit from it.

In the process of my research, I developed a program to access Google Calendar which now has over 25,000 users. A calendar converter program I wrote is used by an average of more than 200 users per day. During Sustainability Week 2008 at Virginia Tech, I released a Blacksburg Transit Schedule application for cell phones to encourage Blacksburg citizens to take the bus instead of driving. It is used by about 300 users every month and growing.

In the spirit of working on real products that are used by real people, I interned at Google three times during my Ph.D. (2005, 2006, 2007.) In 2007, my project enabled users of Google Book Search to clip personalized content from books and embed that into their own web site or blog. This enables teachers to excerpt from literary classics for their class home page, for literature scholars to debate the nuances of texts, and for commentators to dissect parts of books. My intern work was covered by several news outlets, chief among them, at Google’s Corporate Blog.

Academia encourages published work—publish or perish, they say—while original contributions such as new ideas and untested directions are undervalued in the traditional ways of evaluating research. The Internet changes that too. Several times when I have come up with ideas that may or may not be viable research projects, I have written about them on my blog. The public scrutiny and invaluable feedback I’ve received made it easy to separate the wheat from the chaff. My advisor has always been supportive of those ideas that were encouraging research directions: the latest among them resulted in a paper that has been nominated for the ACM SIGCHI Student Research Competition 2009.

An area that I have recently been concerned about is the open publication of raw data sets. I perform human experiments which are reviewed by the Institutional Review Board (IRB) for ethical compliance. There is inherent tension between the privacy implications of human experiments and the Open Science dream of being able to publish all experimental data publicly so that others may analyze it in novel ways. I plan to investigate the ethical, moral and legal responsibilities of such an endeavor, recognizing that we as researchers owe two allegiances: to our experiment participants and to the scientific community, in that order.

I am happy to be a researcher at a time in our history when competitive collaboration trumps closed confidentiality. Science and innovation can only progress faster when information is freely shared among researchers, scholars and citizens.

Book-as-Blog: Encouraging Reading by Posting a Chapter at a Time

I realized I haven’t picked up a book in weeks, (non-academic book, that is), but I’ve read more than my fair share of blogs in that same time. I wonder if part of the reason is the longer time commitment required by a book. This prevents it from being read quickly and keeps it forever on my wish list. If so, then how about a service that breaks down books into blog-post-sized chunks and publishes them every few days?

The idea is inspired by, — nay, stolen from — Kevin Kelly, who is reissuing his 10-yr old book as a blog (hat-tip to Seth Godin’s post on the topic). His reasons are different, though. The book is out-of-print, and is already available as a downloadable PDF from his web site. Making it available as a blog is just another way of spreading his ideas wider, which is a great idea.

But apart from that, I like the idea of chopping up a book into chapter-sized chunks and making them available to readers one at a time. Not for any economic reasons, but because attentional resources are so scarce these days. A few times during the day, I have some free time which I use to read a few blog posts. If I ever thought about picking up a book during these breaks, I wouldn’t do it, simply because of the (arguably artificial) time commitment issues it raises in my mind. But talk about a chapter-sized, or even smaller blog post, and I’d read it.

Of course, not all book content has an affordance for this kind of splicing and dicing. If it takes several minutes for a reader to re-establish context from the last blog post, the purpose is lost. Some authors would consider their books a work of art too precious(ssss) to split it up into anything smaller. That’s also the reason why bands are often reluctant to sell singles instead of entire albums (apart from the record labels preferring to sell you 9 lame tracks bundled with 1 great track for $10 instead of $1, thank you very much.) But several non-fiction books could verily adapt to such a format.

The book-as-blog need not be free (as in no charge.) Sure, charge me for it. Implementation would be easy, charge me a micropayment and give me a secret watermarked feed URL. With so much new content licensed under a Creative Commons attribution license, it’s also possible to develop a web service that does this for liberally-licensed and public domain works. This is compatible with Creative Commons Attribution (BY), Attribution-ShareAlike (BY-SA), Attribution-Noncommercial (BY-NC), and Attribution Non-commercial Share-Alike (BY-NC-SA) licenses (but I’m not a lawyer, this is not legal advice, blah blah.)

Maybe something like this will finally get me back to the several-books-a-month club I used to be a member of, until I discovered this newfangled shiny thing called the Internet.

Email should have Expiration Dates

The entire idea behind this blog post has been summed up in the title, so all I need to do now is to explain why I think email should have expiration dates, and how that would make personal information management better.

Email, as we all know, started off as a way of sending short messages to colleagues within a department. It has since evolved into a monster of a tool that does everything it was never designed to do. The paradox is that it is exactly the kinds of messages that email was designed to handle that cause me the most trouble these days.

  1. I often receive email from my friends about meeting up for lunch. This is important, but only for that particular day (and that too, if I receive it before lunch time).
  2. My research collaborators send me email when a paper submission deadline is near, with the draft attached to it. Those emails are not nearly as important after the deadline.
  3. My friends and I exchange travel plans over email, but is it as useful after the trip is done?

These are the kinds of messages I’m talking about: important but time-sensitive. Then there are others which are not really important, but simply one-time notifications that I can take action on and then forget (“bill is due in 2 days”, “X added you as a friend”, “your order was received”, “your package has shipped”, “free donuts in break room”, “we are not meeting today”, etc.)

Why do they linger on in my mailbox for years? They become indistinguishable from the really important email that I need to save for years, such as some very interesting and intelligent discussions I have had with others. Note that I’m not including spam in this discussion, because in my opinion, there are adequate spam-filtering tools circa 2008 that perform well enough for most users for the most part with an acceptable false positive rate. Not perfect, but acceptable.

The Keeping Problem

Email is no longer ephemeral — people hold on to their email for years. This is what results in the Keeping Problem in Personal Information Management: there is so much of information coming at us that we don’t want to spend the time to decide what to keep and what to trash, so we end up keeping all of it. We hope we never have to do spring cleaning, and instead rely on search to find what we want.

Filing is not the answer

Many people file and tag their email, but the question is, is the cost of doing so (time as well as attention) worth the payoff at the end? Consider the two alternatives: spending 10 minutes each day filing your email, versus spending an hour a month looking for that one email. Pretty soon, the second alternative starts looking better while swimming in a sea of email with no signs of abating.

Same needle, bigger haystack

The bigger the haystack grows, the harder it is to find the needle. The solution is to reduce the size of the haystack. Automatically. Most other solutions empower the user to filter, sort, file, tag and do other sorts of things to their email that do not scale very well. That’s where Email Expiration Dates come into play. For it to work, they need to be (1) defined and (2) honored.

Defining an Email Expiration Tag

Email expiration tags can be defined in several ways by several entities that handle the email message at some point of time in transit.

  1. By the sender of that email who cares about the recipients;
  2. By the email client (MUA) used by the sender, automatically inferring from certain common-sense words; e.g. subject contains lunch and body is less than 100 bytes;
  3. By the email server software that intelligently tags email based on common patterns seen across multiple users;
  4. By the recipient’s email client, based on heuristics;
  5. By the recipient’s email client, based on a user-defined rule set;
  6. Or explicitly by the recipient in a spring cleaning session.

Honoring an Email Expiration Tag : Fully standards-compliant

RFC 822 allows custom tags (Sec. 4.7.5). These are commonly referred to as X- headers, since the specification requires that all such tags be prefixed with “X-”. Many applications built on email make use of such tags: mailing lists use the X-List-* headers to specify the list name, subscribe URL and unsubscribe URL in a mail message. Spam filtering software such as SpamAssassin assigns a score to each email, saved as an X- header. Mail clients are free to interpret these tags as they see fit.

An expired email will not be automatically deleted if the user does not want it to be. This is important for archival purposes and to satisfy the stringent reporting requirements of the Sarbanes-Oxley Act. But now the user can make a one-button choice about whether or not expired emails be deleted, archived, moved away or kept around.

With help from legitimate bulk email senders (not spammers)

Bulk mail such as Facebook notifications could have expiration dates set to “one week after receipt”. Bill reminders could set the expiration date to be “2 days past deadline” (and then send another notification if payment is not received by then.) Donut announcements could expire at the end of the day. Talk announcements could expire at the end of the talk.

Fixing the post-vacation blues

Returning from a vacation is no longer refreshing, as we are thinking about the sheer volume of email we need to process once we get home. If I was on vacation when the donuts were on the table, I should not be bothered about it when I return. Go away! If it’s an invitation to a talk that happened while I was away, I don’t need to hear about it now.

What will it take for adoption?

Defining a standard is no use if it isn’t used. The best way for such a solution to be adopted is for a major email provider implement it themselves, perhaps in a limited beta? On the interface side, this requires two additions: one for sending, one for processing received messages. The widget at the sender’s end is simply a calendar picker, or a drop-down with relative dates (“tomorrow”, “next week”, etc.) At the receiving end, it’s a three-way radio button that lets users “Delete”, “Archive” or “Leave alone” expired messages.

Till then, it’s back to manual spring cleaning. Oh well.

Acknowledgments: I have had several stimulating discussions with my advisor, Manuel Pérez-Quiñones, and my colleague, Pardha Pyla, about our respective email filing strategies, (that mostly began as venting sessions). This idea no doubt borrows from my analysis and conclusions based on some of those conversations.

Why I love working here!

27 Aug, 2008 — Academic, Funny, Life

When most professors have closed-door policies and need weeks of lead time before being able to schedule a meeting, here’s why I love working here!

Who's Online?

Separating Phone Numbers from Phones

17 Jan, 2008 — Academic, Design & Usability, Thoughts

Last night, I left my cell phone in my car. As with most of my follies, I realized it a few oh-no-seconds after I got home, but only after I’d taken off my jacket, gloves, cap, shoes and socks. It was an unnecessary walk in below-zero temperatures, but it got me thinking about phones, identities, what’s wrong about it all, and how it could be made better.

The problem is this: phones and phone numbers are tightly coupled together [1]. No wonder people keep their phones close to their heart — their personal identity is locked in it. If I don’t carry my phone, there’s no way to answer calls that I receive at that phone number. I can perhaps check voicemail from another phone, but still cannot make and receive phone calls under my own phone number.

Now compare this to email: if you go on a vacation without your own laptop computer, it is still possible to “borrow” someone’s random computer and check your messages. The messages you send will have your ID (your email address) attached to them, and the people you interact with will have no idea what machine you used (and there is no need for them to know.)

Why can’t we have a phone identity (our phone number) separate from the device (our phone) that is used to access it? If I forget my phone in the car overnight, I should be able to just add my phone identity to the home phone. That way, all calls that would have been received by my handset in the car will now be received at my home phone, and callers/callees will not know a thing. The next morning, I would re-establish my identity on my cell phone, and things will be back to usual.

I’m not a big fan of call redirects: that puts a temporary bandage on the problem instead of actually solving it. I don’t want my identity routed to another identity: I want to be able to use my own identity wherever.

This would also open up the market for multiple-identity phones. A couple can add both their identities to a single home phone in the evening, while they carry individual cell phones during the day. Forgot your cell phone at home? No problem, just borrow a loaner phone from the office receptionist and use it all day long (just as you would borrow a loaner security badge if you forgot yours). It would also make it easy for a group of people to be able to respond to a single phone call, e.g. despatch services for emergencies. A group of doctors could share a single phone number. Whoever is on emergency call duty would add the group phone number to his/her cell phone, and remove it after the duty ends.

Historically, a phone number has been tied to a phone, mostly because of technical constraints, beginning with the days of the human-operated telephone exchange. Email has shown that identities (email addresses) can be independent of devices (computers), that many identities can share a device, and many devices can be used by a single identity.

It’s an easy conceptual step forward to move to the many-to-many model instead of the current one-to-one. But there is a tremendous amount of change required of the infrastructure, and it won’t be cheap. But since I don’t happen to be in the business of implementing it (at least not yet!), so I’ll just write about this idea and hope that someone picks it up. Maybe someone will listen, and like it, and implement it.

Then I won’t have to walk out in the $#@*%$#^ snow to fetch a %$#%#$* cell phone.

[1] The more pedantic among us will point out that GSM phones keep the user’s identity on a SIM card, and CDMA phones maintain a single ID tied to the IMEI number of a phone. Although possible, that does not make swapping identities across phones easy: in the first case, you must have your current phone handy, which does not help solve my problem of having left the phone in the car overnight, and the second one requires a long phone call to the carrier to make the change. Neither is as quick or handy as the method I envision.

« Previous PageNext Page »