URL Design Sins: 16 things that don’t belong in URLs

5 Mar, 2011 — Design & Usability, HCI, Thoughts

(Because 16 is as good a number as any.)

Much has been said for a long time about making your URLs easy to use, remember, type, hack, and spread virally. There is still no dearth of ugly URLs all over the Web. A few very popular content management systems also engage in dirty URL practices, and it’s a shame. To aid you in cleaning up your URLs, here’s a list of specific things that do not belong in a URL.

  1. www. We’ve spent enough time with the World Wide Web to know that web pages reside on the WWW. Adding those four characters to the beginning of every single URL not only requires users to type them in every time, but also requires 4 extra bytes in every single database that stores URLs. Think for a moment how many bytes that would be. Get rid of them! And after you do that, make sure all your www. URLs redirect to the non-www. version.
  2. Port numbers. Unless your site is under test, there is no valid reason for hosting it on a non-default port (i.e., a port other than 80.) Apache on Mac OS X has a performance cache that runs on port 16080, and makes every URL of the form http://your-site.com:16080/. Unless you find a mechanism to run the performance cache on port 80, it is a good idea to dump the cache. It’s not worth the confusing URL (to most users, if not to you.) Standard well-known port numbers are there for a reason.
  3. Index filenames. Filenames such as index.php and default.asp do not give us any more information than the rest of the URL. Drop them.
  4. Details of the server-side technology. Your users don’t need to know what software you’re running behind the scenes. They couldn’t care less about whether your pages are .php, .jsp, .aspx or .do. It’s best to configure your server to hide these extensions, and then make sure none of your URLs contain them.
  5. Special directories for special scripts. You no longer need to place your scripts in a cgi-bin. Get rid of that directory and any others like that. If your server requires you to do something like that, either find a way to configure it correctly, or upgrade to one that will let you do that.
  6. Document maintainers’ names. Often, when each document has an assigned maintainer for some duration of time, those documents end up being in that particular person’s web space. Later, when the maintainer moves on or someone else takes over the maintenance, you’re left with a different URL than what you started with. To avoid this, it’s best to categorize documents by topic and subject instead of under ~username/document.html.
  7. Internal database IDs. Sure, your content management system needs those IDs to locate your content, but your users don’t need to know. If it takes an extra database lookup to get the ID from the URL, then so be it.
  8. CMS Module Names. Use a CMS that is intelligent enough to render a page without needing all sorts of information stored in the URL. Joomla is particularly notorious at this. What does this URL tell you about where it will take you?

    http://www.joomla.org/content/section/1/74/

    Now what if it were:

    http://joomla.org/news

  9. MiXeD-CaSe NaMeS. Don’t confuse your users by-Mixing-Upper-case-and-Lower-case-Characters-in-the-URL. Stick to lower-case letters, and don’t make them guess. If your user actually types in a URL in mixed case, normalize it on the server and serve the appropriate case.
  10. Random gunk. Unless you are a URL-compressor service such as Tiny URL or SnipURL, forget using random characters in your URL. Nobody wants to visit http://yourdomain.com/WijHyYQnVPWNs and guess what it might lead to.
  11. Session IDs. Make sure no user-session-specific identifiers end up in your URLs. This makes sure that users can pass on URLs to other users via email or IM, be able to bookmark them, and be sure that they represent a single resource. There are better places to keep session state in.
  12. Punctuation. Avoid punctuation that might make it difficult for people to tell others about your wonderful site over the phone. The only punctuation you may have is a hyphen ("-") and HTML entities that have special meaning (e.g. ?, #, :, + and @). No underscores, commas, periods, brackets, parentheses, braces, quotes, less-than, greater-than, equals, or pipes.
  13. Database query details. If your web pages have even a hint of database query language in the URLs, you should be on The Daily WTF.
  14. Repeated domain name. If the address of your web site looks like http://your-site.com/your-site/your-page.html, then you should have a chat with your web hosting provider about how to shorten it to http://your-site.com/your-page.html.
  15. Inconsistent naming. If you sell several products, then make the subdirectories below each product name exactly identical. If someone were to replace a product name by another, the rest of the URL structure should still continue to function. In other words, strive for consistency in naming.
  16. Missing content at each level. When a URL is several levels deep, users should be able to chop off parts at the end ("hack the URL") and still be able to get to a usable page. E.g. if you’re a news site, and if an example URL looks like: http://my-news-site.com/2008/05/21/news-story.html, make sure you include a list of news articles from 21 May 2008 at http://my-news-site.com/2008/05/21/, and a list of links to daily articles for the entire month of May 2008 at http://my-news-site.com/2008/05/.

There are some easy technological solutions to make this work. Many of these do not require you to change the underlying file system structure or database structure.

But most of this comes with discipline: there is nothing here that is technology magic. It is just an application of common sense to a common domain (no pun intended.) Google mod_rewrite and content negotiation to get started.

Simplified Twitter Microsyntax for the Haiti Earthquake

18 Jan, 2010 — Academic, Design & Usability, HCI, Thoughts

In this post, I have typeset many more sentences in bold than I usually do, so readers can quickly skim through it.

I applaud the efforts of U. Colorado’s EPIC Group in assisting the victims of the Haiti earthquake in calling for help using Twitter, and to make their tweets discoverable and actionable. I just performed a Twitter search for the terms #haiti -RT -http (includes all Tweets tagged #Haiti, except retweets or links) to inspect some of the tweets that are directly related to happenings on the ground, and they are (as expected) only a minuscule percentage of the total number of tweets about #Haiti. Syntax is thus sorely needed to achieve a decent signal-to-noise ratio to assist relief efforts.

Though, in my opinion, the current version of the tweet syntax seems too formal, too rigid and a tad too complicated for victims or rescuers on the ground. I am a programmer, and even I had trouble mentally parsing a few of the examples provided. We must keep in mind that Haiti is a bi-/tri-lingual country (and neither of them is English), so any syntactic terms used should preferably be semi-obvious to non-native speakers of the language as well as rescuers.

Roles of Microsyntax

  1. Make tweets discoverable: Microsyntax can assist local search-and-rescue efforts and unaffected Twitter users in determining if a tweet is actionable. This task is partly a Signal Detection Task and partly a Data Mining problem. In both situations, microsyntax can prove helpful: all that’s needed is a single tag that emphasizes that a particular tweet is actionable (versus not), e.g. #haiti #rescue (or #haitirescue, to avoid having to type a second # (hash) sign). This will greatly increase the sensitivity parameter d’ of the signal detection task.
  2. Make data mining easier: Once a tweet has been detected to be actionable, its contents must be parsed into a form that local efforts can take action upon. While it’s true that all the other proposed microsyntactic tags make it easier for applications to parse the data, this is at the cost of requiring users to learn new syntax. This seems to me a little too much to expect from victims of a recent calamity of this scale as well as from rescue workers with other higher priorities. Instead, as long as our tools can identify relevant tweets, computers should be able to perform the second task of parsing locations, names, and verbs from tags quite easily.

Also, microsyntactic terms need not always be prefixed with # (hash) signs; they are often difficult to type using cell phone keyboards, and on some handsets, may hamper input methods such as T9. Because of the intervening # signs, Tweets containing the proposed microsyntax decrease typographic readability for someone browsing through tweets.

To summarize, this imposes a heavy cognitive load on victims and search-and-rescue efforts while making parsing easier for machines. However, the task of parsing details from tweets can also easily be performed by large numbers of humans a.k.a. crowdsourcing via volunteer efforts or via tools such as Amazon’s Mechanical Turk.

Simpler, Lighter Microsyntax

The following are examples of microsyntax that are more readable, yet also parseable by machines. All situations are based on the ones in the original proposed microsyntax. Most are directly based on the EPIC microsyntax, with a few simplifications.

  • Rule 1: Always write in the third-person. This takes care of part of the name problem.
  • Rule 2: Instead of using #loc for locations, use “at”. It’s much more natural and not much more difficult to parse.
  • Rule 3: Verbs are actionable. Not syntactic verbs, but English (or French or Haitian Creole) verbs. It’s a trivial task to populate a tool with a dictionary to detect all word forms correctly.
  • Rule 4: Anything that cannot be parsed ends up as the equivalent of the #info tag (see EPIC syntax).
  • Rule 5: The entire text of the tweet should always be available to a human, so whatever information was incompletely parsed can be understood manually, and optionally added to the parsed version by a human.

The general aim is to require as little syntax knowledge as possible, and to keep as close as possible to the natural way people write tweets.

Examples

TWEET-BEFORE: Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter a 3 story schol building
TWEET-AFTER: #haiti #ruok #name Sherline Birotte aka Memen. Last seen #loc 19 Ruelle Riviere College University of Porter #info a 3 story schol building
Simplified Microsyntax: #haiti #rescue Looking for Sherline Birotte aka Memen. Last seen at 19 Ruelle Riviere College University of Porter, a 3 story school building

This tells the computer us:
What = Looking for someone.
Who = Sherline Birotte aka Memen (identified fuzzily based on initial capital letters)
Where = 19 Ruelle Riviere College University of Porter (automatically parsed based on “at”)
What else = “a 3 story schol building” (i.e. everything else in the tweet)

TWEET-BEFORE: Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying.
TWEET-AFTER: #haiti #need #food #name Mirna Nazaire lives in #loc PAP at Bizoton 6 #12 #info neighborhood w/o food. People dying
Simplified Microsyntax: #haiti #rescue Mirna Nazaire at PAP at Bizoton 6#12 needs food. Entire neighborhood without food. People dying.

This tells us:
What = needs food. (automatically detected from the verb in the sentence.)
What do they need = food (automatically detected from the object in the sentence.)
Who = Mirna Nazaire (heuristically determined from initial capital letters.)
Where = PAP at Bizoton 6 #12 (detected from microsyntax “at”)
What else = “neighborhood w/o food. People dying.” (Rest of the tweet, unfiltered.)

TWEET-BEFORE: French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon
TWEET-AFTER: #haiti #offering #med #loc french lycee in rue marcadieux bourdon #num 30+ #info French hospital is open and ready 2 receive wounded
Simplified Microsyntax: #haiti #rescue French hospital ready to offer help to 30+ wounded at the french lycee in rue marcadieux bourdon

This tells us:
What: Hospital. Also, something to do with medical efforts. (no need to tag explicitly, we can infer that from ‘hospital’.)
Where: The french lycee in rue marcadieux bourdon. (Automatically parsed from microsyntax “at”.)
How many people: 30+. (It’s already a number, no need to state “#num” explicitly.)

These are just a few suggestions. I will be contacting the PIs (principal investigators) of the EPIC project directly with some of my recommendations, but please continue to follow their syntax until they recommend anything different. The current syntax proposal isn’t perfect, but it is more important to avoid fragmenting the tagspace.

The query: Protocol

Update: I implemented this idea at http://queryprotocol.appspot.com. Comments, questions, and suggestions are welcome!

When trying to explain a concept to others over email, I often find myself linking to a search engine’s result pages for a specific query, instead of a single destination URL. These are non-navigational queries, and there is no single result that I expect to be the most important one. Instead, my intention is to provide the reader a variety of links on the topic such that s/he may draw her own conclusions, or solve their own problem — all they need is a nudge towards the right query term to use. If, over time, better search results are available for the same query, then future readers get the benefit of automatically updated results.

E.g. Q: Where can I find the latest numbers related to the spread of the Swine Flu?
A: Try [H1N1 update].

To do this today, I simply link to my favorite search engine, Google. But that does not seem fair to fans of other search engines: Bing, Yahoo!, Altavista, and others. I would prefer to use a notation that allows the reader to use their choice of search engine to obtain the results. Just as we specify our default browser and default email client, we should be able to pick our default search engine.

We have already solved the first two problems (picking default browsers and email clients) using protocol handlers in the operating system. When I pass around a link to a web page, starting with http://, I do not specify the browser it should open in. Your operating system determines that it’s a link to a hyper-text transfer protocol (HTTP) document, and invokes your default browser. Similarly, for emails, the mailto: protocol provides for an application-agnostic way to invoke the user’s default email client to send an email.

It is easy to see how a query: protocol could be implemented similarly. To point you to the search results for a particular term, I would send you the following link: (don’t click on it, it won’t work — at least as of this writing.)

[h1n1 update]

The URL that the above links to is query:h1n1+update. Note there’s no HTTP protocol marker specified. If the OS wanted, it could provide local results as well. This means that the protocol extends seamlessly to Desktop Search as well.

Syntactically, this validates as a URI. Just as the mailto: protocol handler defines standard parameter names, subject, cc, and bcc, similar parameters can be standardized for the query: protocol. These may include corpus restricts (corpus={web, images, desktop, ...}), pagination controls (start=0, num=10), or domain restricts (site=manas.tungare.name).

Implementation is simple: all operating systems and major browsers support external custom protocol handlers. They can be configured as follows:

Protocol Prefix: query
Application Name: /Path/to/Application

The application does not need to be very complicated. It’s a mere stub, which, depending upon the user’s preferred search engine, converts a URI of the form query:h1n1+update to http://google.com/search?q=h1n1+update or http://bing.com/search?q=h1n1+update and opens that link in the user’s default browser.

Eventually, if browsers understand the query: protocol, there is no need for the stub application, and users may be able to share and exchange queries and yet seek results using their favorite search engines.

(The opinions expressed in this blog post are solely my own, and may not reflect the opinions of my employer, Google.)

Email should have Expiration Dates

The entire idea behind this blog post has been summed up in the title, so all I need to do now is to explain why I think email should have expiration dates, and how that would make personal information management better.

Email, as we all know, started off as a way of sending short messages to colleagues within a department. It has since evolved into a monster of a tool that does everything it was never designed to do. The paradox is that it is exactly the kinds of messages that email was designed to handle that cause me the most trouble these days.

  1. I often receive email from my friends about meeting up for lunch. This is important, but only for that particular day (and that too, if I receive it before lunch time).
  2. My research collaborators send me email when a paper submission deadline is near, with the draft attached to it. Those emails are not nearly as important after the deadline.
  3. My friends and I exchange travel plans over email, but is it as useful after the trip is done?

These are the kinds of messages I’m talking about: important but time-sensitive. Then there are others which are not really important, but simply one-time notifications that I can take action on and then forget (“bill is due in 2 days”, “X added you as a friend”, “your order was received”, “your package has shipped”, “free donuts in break room”, “we are not meeting today”, etc.)

Why do they linger on in my mailbox for years? They become indistinguishable from the really important email that I need to save for years, such as some very interesting and intelligent discussions I have had with others. Note that I’m not including spam in this discussion, because in my opinion, there are adequate spam-filtering tools circa 2008 that perform well enough for most users for the most part with an acceptable false positive rate. Not perfect, but acceptable.

The Keeping Problem

Email is no longer ephemeral — people hold on to their email for years. This is what results in the Keeping Problem in Personal Information Management: there is so much of information coming at us that we don’t want to spend the time to decide what to keep and what to trash, so we end up keeping all of it. We hope we never have to do spring cleaning, and instead rely on search to find what we want.

Filing is not the answer

Many people file and tag their email, but the question is, is the cost of doing so (time as well as attention) worth the payoff at the end? Consider the two alternatives: spending 10 minutes each day filing your email, versus spending an hour a month looking for that one email. Pretty soon, the second alternative starts looking better while swimming in a sea of email with no signs of abating.

Same needle, bigger haystack

The bigger the haystack grows, the harder it is to find the needle. The solution is to reduce the size of the haystack. Automatically. Most other solutions empower the user to filter, sort, file, tag and do other sorts of things to their email that do not scale very well. That’s where Email Expiration Dates come into play. For it to work, they need to be (1) defined and (2) honored.

Defining an Email Expiration Tag

Email expiration tags can be defined in several ways by several entities that handle the email message at some point of time in transit.

  1. By the sender of that email who cares about the recipients;
  2. By the email client (MUA) used by the sender, automatically inferring from certain common-sense words; e.g. subject contains lunch and body is less than 100 bytes;
  3. By the email server software that intelligently tags email based on common patterns seen across multiple users;
  4. By the recipient’s email client, based on heuristics;
  5. By the recipient’s email client, based on a user-defined rule set;
  6. Or explicitly by the recipient in a spring cleaning session.

Honoring an Email Expiration Tag : Fully standards-compliant

RFC 822 allows custom tags (Sec. 4.7.5). These are commonly referred to as X- headers, since the specification requires that all such tags be prefixed with “X-”. Many applications built on email make use of such tags: mailing lists use the X-List-* headers to specify the list name, subscribe URL and unsubscribe URL in a mail message. Spam filtering software such as SpamAssassin assigns a score to each email, saved as an X- header. Mail clients are free to interpret these tags as they see fit.

An expired email will not be automatically deleted if the user does not want it to be. This is important for archival purposes and to satisfy the stringent reporting requirements of the Sarbanes-Oxley Act. But now the user can make a one-button choice about whether or not expired emails be deleted, archived, moved away or kept around.

With help from legitimate bulk email senders (not spammers)

Bulk mail such as Facebook notifications could have expiration dates set to “one week after receipt”. Bill reminders could set the expiration date to be “2 days past deadline” (and then send another notification if payment is not received by then.) Donut announcements could expire at the end of the day. Talk announcements could expire at the end of the talk.

Fixing the post-vacation blues

Returning from a vacation is no longer refreshing, as we are thinking about the sheer volume of email we need to process once we get home. If I was on vacation when the donuts were on the table, I should not be bothered about it when I return. Go away! If it’s an invitation to a talk that happened while I was away, I don’t need to hear about it now.

What will it take for adoption?

Defining a standard is no use if it isn’t used. The best way for such a solution to be adopted is for a major email provider implement it themselves, perhaps in a limited beta? On the interface side, this requires two additions: one for sending, one for processing received messages. The widget at the sender’s end is simply a calendar picker, or a drop-down with relative dates (“tomorrow”, “next week”, etc.) At the receiving end, it’s a three-way radio button that lets users “Delete”, “Archive” or “Leave alone” expired messages.

Till then, it’s back to manual spring cleaning. Oh well.

Acknowledgments: I have had several stimulating discussions with my advisor, Manuel Pérez-Quiñones, and my colleague, Pardha Pyla, about our respective email filing strategies, (that mostly began as venting sessions). This idea no doubt borrows from my analysis and conclusions based on some of those conversations.

How do I eat Pringles chips out of a can?

29 Oct, 2007 — Design & Usability, HCI, Thoughts

I ask you, the blogosphere, to enlighten me on the best way to eat Pringles that does not involve a bowl. The Pringles can is one of the iconic designs of modern times — uniformly-shaped potato chips in a tube — that seems to value form over function.

Let’s admit: eating chips is a secondary task for most Americans. These are snacks people munch on when they’re doing other things. Thus, these chips should be easy to grab with one hand and have the other hand free for the television remote, steering wheel or keyboard/mouse. At the same time, it is important that chips don’t spill, or worse yet, crumble in your hand. So what’s the best way to eat them without needing a bowl? (because using a bowl would just be weaseling out of this problem into one already solved in The Textbook.)

The first few chips are easy. (Isn’t that the case with everything? :) ) They’re within the grasp of your fingers, so it’s no different than plucking a few chips from a bag. It’s after the top few disappear that the problem starts. Should I force my hand into the can? Should I invert the can so the chips fall out into my hand? Should I tilt the can ever so slightly and tap on the side to have the chips exit one by one instead of stampeding all over themselves?

I’ve tried to dig in with my hand to get to the next few, but my hand is too big to fit inside the can, and it’s probably not a good idea anyway. I shudder to think of the day I’m in an Emergency Room with a Pringles can wrapped around my wrist, with $200/hour doctors cutting off an embarrassing roll of cardboard from the one organ that distinguishes men from apes. No, excavating anything but the top few is a job for professional archaeologists.

I’ve tried inverting the can with the lid on, so (I hoped) the chips would all accumulate on the lid, and then I could simply open it up and eat a few. The problem is, the quantum stable state for potato chips is a pile of crumbs. Inverting the can gets all the crumbs to the bottom of the can, and when the lid is opened, that’s what comes out first.

I’ve tried tilting the can at a precise angle and knocking on the side until the top few chips make their way slowly out the door. This sometimes works, but takes a long time, and very skillful knocking/tapping/flicking to get the right number of chips out of the can. Often, you’ll spend five minutes tapping unsuccessfully, then, out of a burst of frustration, you’d tap just a little bit harder, and have Pringles rain upon you. No go.

Dear Mommy taught me to search the Web before posting random questions to total strangers, so I did my homework. Here’s an innovative method of eating Pringles, but I’m no chopsticks ninja. And eating chips with chopsticks vaguely reminds me of the Seinfeld episode with George eating Snickers with a knife. You get the point, sort of.

So my question to you is, what’s the best way you’ve found to eat Pringles out of a can without spilling any crumbs, using a minimum number of hands to do it? A second, deeper, question, from my obvious position as a design and HCI person is, why has such a design resisted change over so many years despite being so hard to eat from?

Next Page »