Final Exam / Answer No. 5

5) Identify the two most common errors associated with keyword searching across e-mail messages.

The two most common ways to search for keyword incorrectly are to ignore case significance and to improperly stem words.

Case significance is the easy one because most keyword searching tools are case significant. You have to turn off case significance anytime you're doing policy-based keyword searches. This is the number one error that most people make.

Stemming is a more significant problem and one that is not handled easily. Without stemming, you have to search for every variation of the word that you're looking for. For example, you can't simply search for 'poop' because you won't catch the important variations 'poopy,' 'poops,' 'pooped' and 'pooping.' If you try to ignore the spaces on either side of a word (or, more precisely, the white space, which can include line breaks, tabs and other formatting characters), you'll end up with every word that has 'poop' in it, such as nincompoop (used to describe the person who wanted you to search for poop). Good regular expression and search engines handle word stemming automatically for you; more primitive ones require you to handle this kind of stemming by yourself.

<< Back to quiz


This was first published in April 2005

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.