Manas Tungare

CD / DVD Spindle Search

I've been a huge fan of Google Desktop Search, especially seeing how fast it is over the traditional file system search in Windows. But what I noticed it lacked, was the ability to index removable drives such as CDs, DVDs and external hard-disks. Even if it did index them, there is always the problem that the file might be unavailable when trying to retrieve it, simply because the CD is in a spindle, not in the drive.

But Google had a solution: they made their API accessible, so I wrote a plugin for it. Spindle Search now lets you add CDs, DVDs and other media to your Google Index, and then comes up with a dialog when you search for them, telling you where to locate the file and disk.

Go take a look, tell me what you think.

Read More

Jurassic Park ... what the movie left out

I utilized a day from Spring Break to catch up on some reading. Jurassic Park, the novel I'd wanted to read for a long time now, finally came up on my radar. I knew Crichton, I knew how the scenes would be described, I knew the subtle theoretical underpinnings to be expected from any Crichton creation, but his treatment of Ian Malcolm's character was absolutely fantastic.

Malcolm was the mathematician that John Hammond had recruited to analyse his Park, but wasn't happy at his skepticity since Day 1. Malcolm's character was underplayed in the movie in the interest of, I presume, keeping it simple. But his application of Chaos Theory to Jurassic Park made the best reading.

An excerpt, a pretty long one: (fair use, of course, as permitted by copyright law.)

"You know what's wrong with scientific power? It's a form of inherited wealth. And you know what assholes congenitally rich people are. It never fails. [...] Most kinds of power require a substantial sacrifice by whoever wants that power. There is an apprenticeship, a discipline lasting many years. Whatever kind of power you want. President of the company. Black belt in karate. Spiritual guru. Whatever it is you seek, you have to put in the time, the practice, the effort. You must give up a lot to get it. It has to be very important to you. And once you've attained it, it is your power. It can't be given away: it resides in you. It is literally the result of your discipline."</p>

"Now what is interesting about this process is, by the time someone has acquired the ability to kill with his bare hands, he has also matured to the point where he won't use it unwisely. So that kind of power has a built-in control. The discipline of getting the power changes you so that you won't abuse it."

"But scientific power is like inherited wealth: attained without discipline. You read what others have done, and you take the next step. You can do it very young. You can make progress very fast. There is no discipline lasting many decades. There is no mastery: old scientists are ignored. There is no humility before nature. There is only a get-rich-quick, make-a-name-for-yourself-fast philosophy. Cheat, lie, falsify -- it doesn't matter. Not to you, or to your colleagues. No one will criticize you. No one has any standards. They are all trying to do the same thing: to do something big, and do it fast."

"And because you can stand on the shoulders of giants, you can accomplish something quickly. You don't even know exactly what you have done, but already you have reported it, patented it, and sold it. And the buyer will have even less discipline than you. The buyer simply purchases the power, like any commodity. The buyer doesn't even conceive that any discipline might be necessary."

Read More

Data Backup for Home Users

<rant>Why isn't there a decent piece of software that lets home users backup their hard drives every once in a while? Or if there's one, why can't I find it anywhere? I can't hire an IT department, so anything more than 3 or 4 clicks is not worth it.</rant>

So I finally gave up and wrote a homegrown tool - the bit about programs scratching a developer's personal itch is so true! It's trivial enough to not call it a piece of "software" or even a "utility". I had the following objectives in mind for my backup strategy:

Unfortunately, I found none that satisfied the given criteria. It didn't help that my primary platform is still Windows. I found Mike Rubel's article on Incremental Backups with rsync nice and informative, but useless on Windows because NTFS won't support hard links.

So I ended up writing this tool that looks at the last-modified date and copies over everything modified after a given date X to a temporary backup directory. Then, I just burn that to a DVD. Who said good solutions are complicated?

Read More

Spam Filter FAQ

Q: What is this spam filter FAQ about?

A: This page explains why I might not have replied to your email and, instead, puts the blame on my spam filter. ;-) It also tells what you can do to make sure I get your email, and what makes my spam filter think that your message was an unsolicited commercial email. It offers tips on what common message characteristics you should avoid, so that other people's spam filters will not block your legitimate messages.

Q: Why am I seeing this page?

A: Probably because you clicked on a link to this page from an email I sent you. I might have included a link to this page in my email signature.

Q: So, why do you use a spam filter in the first place?

A: Due to the huge amounts of spam I receive every day (> 500 messages a day) it has been impossible to manually read every email and decide if it is spam. Therefore, I switched to using a spam filter (SpamAssassin, a free product) sometime ago. For a long time after that, I had configured it so it would only mark each suspicious message with "[Spam]", but did not delete such emails automatically. I also use the junk mail controls built into the Mozilla Thunderbird email client program.

I used to filter these emails to another folder, and whenever I had some free time, I checked that folder on a regular basis.

Q: What do you do now, that made you write this FAQ?

Since around May 2003, it has been entirely impossible to inspect my Spam folder to weed out the good from the bad. So I began to delete all such emails without looking at every single one. There are only 24 hours in a day, and it has become humanly impossible to do this process once every day.

Also, the spam filter I use has been quite accurate in identifying spam, hence the number of legitimate messages that get flagged as spam ("false positives") has reduced drastically to around 1 in every 200, thus making the whole process not worth the time.

Q: How will I know if my email was deleted by your spam filter?

A: The short answer is, if I do not reply to you within a reasonable amount of time, you should assume that your message was inadvertently deleted. I apologize for having had to do this, but as you've seen from the questions above, there is no other alternative. :-(

Q: What should I do so that my emails are not flagged as spam?

A: I can suggest a few rules that SpamAssassin uses to identify spam, so that you may make sure your messages do not satisfy them. That should help get your email to pass through most other spam filters too, so if you have been having trouble sending email to people in certain organizations where spam filtering is enforced, you have come to the right page.

Q: Just tell me what I need to do:

A: OK, let's start. I will list here the rules, beginning with those that are most easy to implement and progressively to those that are more difficult to. Before I begin, I need to tell you about how a message is flagged. SpamAssassin performs each of these tests on each message received, and assigns a score (positive or negative) for each test. Then, it sums up the score for each message, and if it is beyond a particular threshold, it flags it as spam. So, if your message satisfies two or more rules from the list below, there is a higher likelihood of it being classified as spam.

Rule Why it is treated as spam What you should do
HTML Email

When you send email formatted as HTML instead of plain-text, SpamAssassin thinks it is spam.

The more colors, images and fancy text you have, the more it thinks it is spam.

Try to configure your email client to send all messages in plain-text. It also requires less space in the inbox if you use only plain-text for email.

You will not be able to have nicely formatted text such as bold or italic, but if your message is not at all received, who cares about the formatting?

Yelling (writing text in ALL CAPITALS) When SpamAssassin detects a full line in capital letters (considered 'yelling') it assigns it a positive score (more likely to be spam.)

Do not use ALL CAPS in your email.

If you really need to yell at me, please take a number and wait in line for your turn. ;-)

More lines of yelling The more lines you have in all-capitals, the higher score your message gets. Same as above, write everything in usual sentence case. It is also good netiquette to not write everything in all-capital letters.
To: is empty If you do not address the message specifically to me, or anyone in particular, there is reason enough to believe it is spam. Always put the intended primary recipients' names in the To: field, and the secondary recipients' names in the Cc: or Bcc: fields as appropriate.
"Undisclosed-Recipients" If you (or your mail client) enters "Undisclosed-Recipients" in the To: field, it is 99% likely to be treated as spam. Make sure your mail client does not do things like this!
Subject has a unique ID If you include a unique ID (e.g. GD93jij83) in the subject line, this tactic is commonly used by spammers to check which email accounts are active and which are not. Have a simple, though meaningful, subject line. Do not include weird-looking numbers.
Subject is in ALL CAPS Same as for the lines in all-capitals in the body of the message. Refrain from all-capitals in the subject line.
you@you.com or similar If you send an email with a subject like this, it is obvious that it is fake! Use real names or saved aliases while sending email.
Sent to too many people at once If you send one of those ubiquitous "forwards" to everyone on your address list, it is probably not an important email.

Do not send forwards. They only waste valuable time, money and Internet bandwidth.

If you do need to send important announcements to all your friends, put their names in the Bcc: field instead of in the To: or Cc: fields.

Recipient List is sorted by address Spammers usually sort their lists alphabetically. Do not include too many recipients in the same message.
Typical spam features Things like the Nigerian spam, messages offering free stuff, guaranteed results, investment suggestions, full refund, free trial, 100%, call now, direct email marketing, university diplomas, penis/breast enlargement, mortgages, money-back guarantee, porn site advertisements, remove now, unsubscribe, complies with Senate bill so-and-so, claims that you asked for this email, or you willingly subscribed to this list, opted-in, or provided permission to be spammed

Do nothing.

If your email contains any of these terms, I am not gonna look at it anyway.

A novel way of almost guaranteeing that your email is never flagged as spam, is to incorporate Habeas headers into your message. For more information, please check out their FAQ page. It is free for personal use, hence a good choice for the cash-strapped individual email user like me.

Disclaimer: This is not an official or exhaustive document. The SpamAssassin tests page was used as a reference, and that page should be consulted for any clarification of these rules. This FAQ is intended for the common man concerned about his/her email being flagged as spam.

Read More

Handy Hints for Web Designers

This article on Web usability was written in 1999, when the Web was very different from what it is today. It was widely republished in 1999 and 2000 on various sites catering to the web developer crowd and by universities starting to teach web development. This version is the exact version as written in 1999, with no updates since then, so you might see a few concepts that are no longer relevant. You'll also see historical references to dial-up connections and table-based layouts, and rejoice with me as these concerns start to fade away.

Web Designing is as easy as 1-2-3, claim some of the software tools on the market that "generate" your pages for you. Unfortunately, many web designers today have fallen prey to this marketing gimmick - and the results are obvious. Every now and then, one comes across a website that looks good with a particular browser and a particular screen-resolution; but view it with a different browser, and you can't even read the plain text on the page. Worse still, given the number of operating systems that are used by netizens worldwide, these pages will never be seen properly by more than a half of the intended surfers.

Now let's assume that this web page belongs to a site that sells stuff online. The very fact that half the users cannot even see the page, translates into losses worth half the amount straightaway (perhaps, even more!) I guess that makes a good case for the raison d'être of this article! Web Designing is, in my opinion, a cocktail of creative skills & technical prowess — and one is no less important than the other.

In the following lines, I have jotted down a few points that I noticed during my online journeys, important from the point of view of web designers. Some of them may be taken with a pinch of salt; for it is not possible to please everyone everytime. But most of them are simple enough to be used as a rule of thumb.

  1. A picture, they say, is worth a thousand words. A picture file, alas, is also almost as big. Images, no doubt, enhance the look of a page, but it is not advisable to go overboard in stuffing your page with a truckload of images. Most net-surfers use a dial-up connection, and the average time to load a page should be no longer than 5 seconds. If it's longer, the surfer will most probably click away elsewhere. So, within this time, all the images on a page must be loaded as well. So, as a rough yardstick, keep the aggregate page size less than 30k.

  2. Another important point to note is that each file on the page requires a separate HTTP request to the server. So a lot of small images - even if they do not add up to a lot in terms of bytes - will slow down the loading a lot.

    Even when you must use images for navigation, please give a second thought to the users who will not be seeing those jazzy, fantastic & truly amazing buttons that you spent hours to design. Yes, I'm talking of the ALT text attribute of the IMG tag. Do not forget to provide an Alternate Text for each image that you use for navigation. (It may be left blank for certain images that are purely for aesthetic reasons, but let that be an exception, rather than the rule.) Though not obviously apparent, ALT text can help such users immensely.

    Modern browsers offer users a choice to turn off images. This gives an idea of how troublesome the unwanted images could be.

    A couple of more attributes that make your pages load faster are the HEIGHT and WIDTH attributes. Without these, the browser must wait for the image to download since it cannot know how much space to leave for them!

  3. Navigability & functionality come before artistic excellence. It is no use making your site a masterpiece of art if users cannot navigate around it - even after they reach the main page, they have no clue as to how to go where they want to go.

  4. Especially common, is a kind of navigation that some people call Mystery Meat Navigation. That means, that unless your mouse moves over an image, you have no idea where that link might take you. Only when the mouse hovers do you see the actual link. This is cumbersome because users need to move their mouse all over the place to find out which part is a link and which is not.

  5. Follow the K.I.S.S. principle: Keep it simple, stupid!

  6. Next is a very important practical suggestion: whenever your whole page is within a TABLE, the page cannot render (i.e., the page does not show on the screen) unless the entire table is downloaded. You might have noticed this on several websites, when there is no activity for a long time, and suddenly the entire page is visible. Hence, to avoid such a situation, what you should do is this: Split the table up into two tables one below the other, and let the top one be a short table that displays just the page header and a few navigation links. So now, immediately upon downloading this part of the page, users can see the page header — and this prepares them for the long wait ahead, as well as keeps them from leaving your site to go to other sites, in case of a slow connection.

  7. The ongoing browser wars have left only one casualty — the user. As a word of caution, stay away from all browser-specific functions. Because if a certain feature is supported by one browser, it will most definitely not be supported by another. Where you must use such features, it should not hamper the display of the page in the other browser which does not support such functionality. In other words, your page should degrade gracefully.

  8. Creating a new browser window should be the authority of the user only. Do not try to popup new windows to clutter the user's screen. All links must open in the same window by default. An exception, however, may be made for pages containing a links list. It is convenient in such cases to open links in another window, so that the user can come back to the links page easily. Even in such cases, it is advisable to give the user a prior note that links would open in a new window.

  9. Keep in mind the fonts-challenged users too. The ultra-jazzy "Cloister Black MT Light" font that looks so amazing on your machine may well be degraded into plain old Times New Roman on your user's machine. The reason? He/she does not have the font installed on his/her machine - and one thing's obvious - there's nothing you can do about the situation, sitting halfway across the globe from them.

  10. Stay clear of out-of-the-way hard-to-find fonts. Use plain vanilla fonts like Arial, Verdana, Tahoma, and Courier. If need be, make your jazzy fonts into an image and put that on the page. (and while you're there, do not forget Tip #1.)

  11. A new design trick that is increasingly being used on the web has caught my fancy: It is a very functional navigation bar that guides you across all possible paths within the site. It looks something like this:

  12. Home > Section > Subsection > Page

    What better than to give your users a handy way of visiting just about any other page on your own site, and informing them where they are!

  13. Another new trend on the web is not all that inviting - various vendors come up with "revolutionary plug-ins" and undoubtedly, most amateur web designers jump up to spruce up their pages using them. The reality is that most people won't have them installed, and wouldn't care about it anyway. Come to think of it, have you seen plug-ins on any of the most popular sites, including Yahoo.com, Amazon.com or Google.com? It's simply not the best thing to do. Mention must be made here of Macromedia's Shockwave Flash plug-in, which has now made its way onto most computers today, and thus presents no harm in using vector animation on your site.

  14. Java is yet another often-misused technology on webpages. Use Java as a utilitarian programming language, not as a graphics front-end for your photos/images. There are various things you can do with Java; that does not mean you should do all of them. Java applets are known to run slower, so users experience a certain sluggishness in performance. And worse still, Java has been known to crash certain browsers. This is not something everyone likes, especially if it is done for the sole purpose of showing a set of images in a slideshow!

  15. The moral: Use it, but with discretion.

  16. Never underestimate the importance of those META tags. They can make all the difference between your users coming to your site and going to your competitor's — just because they couldn't find yours. Search Engines heavily rely upon the Keywords & Description Meta tags to populate their search database. And once again, use discretion in writing these. Including a huge number of keywords for the same page can spell trouble. The description should be a small, meaningful summary of the whole page that makes sense even when seen out-of-context of the webpage itself, say, in a listing of search engine results.

  17. And the final point that summarizes all the points so forth: Write for all browsers, all resolutions, and all color-depths. If you show people pages that look best with their own browser and their own resolution, that makes them feel "at home", and you get a better response. Compare this with a website that proclaims "Viewed best with Browser X at a resolution of 1024x768." I'll give you a choice between two options when you see such a page: download the suggested browser (which might well be over 50 Megs), then go get a new monitor that supports the high-resolution, and then adjust your screen setting so you get the perfect picture. Or simply click away to another site. Which do you prefer?

The web waits for no one. And furthermore, the user is king. Try your best to keep the user happy. And to keep all users happy. For, a good website is like a good storefront - it can mean all the difference between a casual surfer and a serious customer.

Read More

« Newer Older »