Can Security Questions be Subliminally Discriminatory?

It’s not funny how many cultural, socio-economic, and even religious assumptions can be implicit in the design of a simple form. Here’s the form I was greeted with today when I tried to log on to ShareBuilder. Note, I don’t want to single out ShareBuilder here; many other companies have such silly forms as well. But it just so happens to be the form I chanced upon today.

Security Questions

Here’s a list of the assumptions made by whoever was tasked with designing this particular form. It’s quite easy to see why these assumptions are not universally applicable. Though not outright discriminatory, these questions suggest a subliminal discriminatory preference for car-owning, married people, both whose parents are currently alive. And no, you cannot create your own questions.

  1. “What was the make (Chevy, Ford, Honda, etc) of your first vehicle?”
    This question assumes that you own a vehicle. Many people in several countries worldwide do not own a vehicle, either by choice, or because their governments had the good sense to invest in public transportation instead of highways.
  2. In what city does your father currently live?
    This one assumes that your father is currently alive. This doesn’t sound discriminatory until you realize that it is far more likely for younger users’ fathers to be alive than older ones’. There you go: it’s ageist.
  3. What is the first name of the maid of honor at your wedding?
    This question assumes (1) that you’re married and (2) that you follow a religious tradition where the concept of ‘maids-of-honor’ exists in relation to weddings.
  4. What is your mother’s father’s first name?
    This is probably the only question that’s universally answerable.
  5. What is your father’s middle name?
    This particular question assumes that everyone has a middle name. I know people from a lot of communities where there is no concept of a middle name.
  6. What was the first name of your manager at your first full-time job?
    This question whispers: ‘Hey students, hope you’ve had at least one job so far in your career, else we don’t quite want you here. Now go away!’
  7. In what city were you married? (Enter full name of city)
    This one (again!) assumes you’re married. If you happen to be gay or lesbian in the wrong state or in the wrong country, you’re not even granted the right to marry, so making an assumption about marriage is adding insult to injury.
  8. What was the name of your first pet?
    Everyone’s had a pet at some point in their lives, right? </sarcasm>
  9. What color was your first vehicle?
    Again, this assumes you have owned a car in the past.
  10. In what city does your mother currently live?
    Finally, this one assumes that your mother is currently alive. (again, ageist as in the second question.)

We have—thankfully—grown out of the age of blatant racial or gender discrimination (for the most part). But behind every user interface widget and every design decision we make is an invisible representation of the subconscious biases we hold in our minds. If you build a team comprising only of like-minded individuals from similar backgrounds, this is the kind of sign-up form you get. If your team includes people who have experienced a rich diversity of life experiences, you can bet their designs will be much more universal.

Mentoring Undergrad Research Across Continents: An Experiment

From July 2010 to June 2011, I will be mentoring 8 undergrad students from my alma mater, Fr. Conçeicao Rodrigues College of Engineering for their Senior Projects (a.k.a. Final-Year projects). We plan to work together on two projects in the area of Personal Information Management. This is an experiment of sorts, because as far as I know, final-year projects in Bombay University have never been mentored remotely, and only a small proportion are research-oriented.

Why it’s the right time for this.

There are several reasons why I’m doing this: several years ago, a few of my friends from undergrad — Salil Wadhavkar, Ninad Pradhan, Vikram Iyengar, Noel Tide, Rahul Saxena, Raghu Cowlagi — had discussed mentoring our juniors to participate in tech contests. At the time, we all had just finished our Masters degrees, and felt that we could nudge a few students to take up research and grad school by such encouragement. That conversation died down for several reasons about 4-5 years ago, but the spark remained.

I enjoy building things. That’s why I had decided I wanted to be in industry (instead of academia) even before I started my Ph.D. program. But it’s also fun to conduct studies and find answers to interesting questions, and it’s incredibly hard to do the latter while my job at Google enables me to build Awesome Stuff™ full-time. Collaborating with students seems like a win-win situation for all of us: the undergrad students gain exposure to research, and we all are able to build, study, and publish what we find.

Proposing the collaboration

With these ideas in mind, I visited my undergrad college (under Bombay University) when I was in Bombay this February. I met the Principal, Dr. Srija Unnikrishnan, and the Head of the Department of Information Technology, Prof. Mahesh Sharma. Both found the idea promising, and Prof Sharma asked me to address the 3rd year classes (junior-level in the US system) that were in progress at the time. Although I felt a little bad interrupting classes to deliver my 15-min spiel, the students, professors, head of the department and the principal were not only supportive but also enthusiastic about this. My concerns about the distance & my lack of physical availability in Bombay were brushed aside by Prof. Sharma (“when there’s chat and Skype, why do you have to be here personally?”)

Soon, two groups of four students contacted me, and we worked on defining the projects between February and now (June 2010). I was impressed by their initial emails, which clearly showed they were not only interested, but had also done their homework before proposing a project. I will be collaborating with one team on an email-related project, and on a task-management related project with another; both within Personal Information Management. (We will publish the details of these projects and the entire source code developed as part of this project as and when we have something to report.)

Other voices

Luis von Ahn from CMU recently published a blog post about outsourcing his research group. While he proposes doing this mainly for monetary reasons, I figured this could be more of an academically-enriching mutually-beneficial experience. There are a few neat opinions expressed in his blog post, as well as in the comments.

Next steps

I expect to learn many valuable lessons about cross-continental collaboration from this process as much as I expect to learn from my new colleagues — who are no doubt better versed in technologies of the day than I am. I will continue to blog about our experiences as we proceed.

If you have any tips for us as we embark on this year-long experiment, please leave us a comment.

The 5 Stages of Driving in India

17 Feb, 2010 — Thoughts

(with apologies to Elisabeth Kübler-Ross)

  1. Denial: No way that guy’s gonna cut across in front of me.
  2. Anger: Whaa? WTF? Get out of my frikkin’ way!
  3. Bargaining: Maybe if I let him cut across, I could still retain a modicum of sanity.
  4. Depression: Screw this, it’s never gonna get any better.
  5. Acceptance: Oh well, when in Rome …

Simplified Twitter Microsyntax for the Haiti Earthquake

18 Jan, 2010 — Academic, Design & Usability, HCI, Thoughts

In this post, I have typeset many more sentences in bold than I usually do, so readers can quickly skim through it.

I applaud the efforts of U. Colorado’s EPIC Group in assisting the victims of the Haiti earthquake in calling for help using Twitter, and to make their tweets discoverable and actionable. I just performed a Twitter search for the terms #haiti -RT -http (includes all Tweets tagged #Haiti, except retweets or links) to inspect some of the tweets that are directly related to happenings on the ground, and they are (as expected) only a minuscule percentage of the total number of tweets about #Haiti. Syntax is thus sorely needed to achieve a decent signal-to-noise ratio to assist relief efforts.

Though, in my opinion, the current version of the tweet syntax seems too formal, too rigid and a tad too complicated for victims or rescuers on the ground. I am a programmer, and even I had trouble mentally parsing a few of the examples provided. We must keep in mind that Haiti is a bi-/tri-lingual country (and neither of them is English), so any syntactic terms used should preferably be semi-obvious to non-native speakers of the language as well as rescuers.

Roles of Microsyntax

  1. Make tweets discoverable: Microsyntax can assist local search-and-rescue efforts and unaffected Twitter users in determining if a tweet is actionable. This task is partly a Signal Detection Task and partly a Data Mining problem. In both situations, microsyntax can prove helpful: all that’s needed is a single tag that emphasizes that a particular tweet is actionable (versus not), e.g. #haiti #rescue (or #haitirescue, to avoid having to type a second # (hash) sign). This will greatly increase the sensitivity parameter d’ of the signal detection task.
  2. Make data mining easier: Once a tweet has been detected to be actionable, its contents must be parsed into a form that local efforts can take action upon. While it’s true that all the other proposed microsyntactic tags make it easier for applications to parse the data, this is at the cost of requiring users to learn new syntax. This seems to me a little too much to expect from victims of a recent calamity of this scale as well as from rescue workers with other higher priorities. Instead, as long as our tools can identify relevant tweets, computers should be able to perform the second task of parsing locations, names, and verbs from tags quite easily.

Also, microsyntactic terms need not always be prefixed with # (hash) signs; they are often difficult to type using cell phone keyboards, and on some handsets, may hamper input methods such as T9. Because of the intervening # signs, Tweets containing the proposed microsyntax decrease typographic readability for someone browsing through tweets.

To summarize, this imposes a heavy cognitive load on victims and search-and-rescue efforts while making parsing easier for machines. However, the task of parsing details from tweets can also easily be performed by large numbers of humans a.k.a. crowdsourcing via volunteer efforts or via tools such as Amazon’s Mechanical Turk.

Simpler, Lighter Microsyntax

The following are examples of microsyntax that are more readable, yet also parseable by machines. All situations are based on the ones in the original proposed microsyntax. Most are directly based on the EPIC microsyntax, with a few simplifications.

  • Rule 1: Always write in the third-person. This takes care of part of the name problem.
  • Rule 2: Instead of using #loc for locations, use “at”. It’s much more natural and not much more difficult to parse.
  • Rule 3: Verbs are actionable. Not syntactic verbs, but English (or French or Haitian Creole) verbs. It’s a trivial task to populate a tool with a dictionary to detect all word forms correctly.
  • Rule 4: Anything that cannot be parsed ends up as the equivalent of the #info tag (see EPIC syntax).
  • Rule 5: The entire text of the tweet should always be available to a human, so whatever information was incompletely parsed can be understood manually, and optionally added to the parsed version by a human.

The general aim is to require as little syntax knowledge as possible, and to keep as close as possible to the natural way people write tweets.

Examples

TWEET-BEFORE: Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter a 3 story schol building
TWEET-AFTER: #haiti #ruok #name Sherline Birotte aka Memen. Last seen #loc 19 Ruelle Riviere College University of Porter #info a 3 story schol building
Simplified Microsyntax: #haiti #rescue Looking for Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter, a 3 story school building

This tells the computer us:
What = Looking for someone.
Who = Sherline Birotte aka Memen (identified fuzzily based on initial capital letters)
Where = 19 Ruelle Riviere College University of Porter (automatically parsed based on “at”)
What else = “a 3 story schol building” (i.e. everything else in the tweet)

TWEET-BEFORE: Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying.
TWEET-AFTER: #haiti #need #food #name Mirna Nazaire lives in #loc PAP at Bizoton 6 #12 #info neighborhood w/o food. People dying
Simplified Microsyntax: #haiti #rescue Mirna Nazaire at PAP at Bizoton 6#12 needs food. Entire neighborhood without food. People dying.

This tells us:
What = needs food. (automatically detected from the verb in the sentence.)
What do they need = food (automatically detected from the object in the sentence.)
Who = Mirna Nazaire (heuristically determined from initial capital letters.)
Where = PAP at Bizoton 6 #12 (detected from microsyntax “at”)
What else = “neighborhood w/o food. People dying.” (Rest of the tweet, unfiltered.)

TWEET-BEFORE: French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon
TWEET-AFTER: #haiti #offering #med #loc french lycee in rue marcadieux bourdon #num 30+ #info French hospital is open and ready 2 receive wounded
Simplified Microsyntax: #haiti #rescue French hospital ready to offer help to 30+ wounded at the french lycee in rue marcadieux bourdon

This tells us:
What: Hospital. Also, something to do with medical efforts. (no need to tag explicitly, we can infer that from ‘hospital’.)
Where: The french lycee in rue marcadieux bourdon. (Automatically parsed from microsyntax “at”.)
How many people: 30+. (It’s already a number, no need to state “#num” explicitly.)

These are just a few suggestions. I will be contacting the PIs (principal investigators) of the EPIC project directly with some of my recommendations, but please continue to follow their syntax until they recommend anything different. The current syntax proposal isn’t perfect, but it is more important to avoid fragmenting the tagspace.

ReTweeting: Attribution for Discovery versus Attribution for Creation

During the past few months, I have found myself consuming more news and articles via recommendations from friends and those I follow on Twitter than via traditional source-based subscription (e.g. subscribing to specific feeds or newspapers). Social media discovery is here, and the best part of reTweeted links is that they have already gone through a round of peer review by peers I trust.

Often, I’m tempted to reTweet that content myself, or post it to Facebook, or share it via Google Reader. A few of these media keep attribution intact (e.g. Google Reader adds the “Shared by” metadata for each person in the chain that shared the content.) Others such as Twitter are restricted by the length of the post, so the “RT @” list quickly gets too long and inevitably gets trimmed along the way.

But there’s no accepted practice for how this list should be trimmed. Should you keep the first Tweeter, even if that person is not the author of the content? (E.g. someone who read an NY Times article and tweeted about it.) Should you keep the last reTweeter, who was your direct link to the content in question? What about multiple Tweeters re-posting links to the same content, so it’s not a tree any more, but a forest of links (imagine a directed graph with edges denoting “shared by X to Y”).

The problem is that by including attribution about the process of discovery, we end up attaching higher value to discovery than creation. When someone reTweets a secondary source of information, attribution for the primary source is often trimmed away. This is especially bad for Creative Commons works that require attribution when re-posted, but is bad in general for any kind of work and for authors of that work.

I have come to the conclusion that although attribution for discovery is important, it’s hard to apply consistently in fixed-character-length media. It’s a completely different story in case of original content generated by the tweeter himself/herself: e.g. one-liners, or authors tweeting links to their (longer) content. Attribution for original content is vastly more meaningful than attribution for promoting someone else’s content (although the value of that act is substantial as well.)

So from now on, I will only attribute original content in my tweets and Facebook updates. My intention is not to discount the value of the source that shared the content with me, but instead to promote the original author of that content wherever possible.

« Previous PageNext Page »