Creating a Search Application -
(Page 13 of 29 )
When gathering keywords in the manner we are doing, there will always be words that you do not want in your index. Any word that is under three characters in length is not normally something we want to search on. We also do not want to include things that contain odd symbols and are not really words.
To eliminate unwanted words, we will use the array_walk() function to examine each element of the $words array. The _prune() function of our class will be supplied to array_walk(). The first thing this function does is to convert the keyword to all lowercase. This is done for consistency so that we always know the case of the words in our index.
Then, the function determines if a word is under three characters, or if it contains odd symbols. We also check to see if the word is in the class variable named $_stopwords. The $_stopwords array contains words that we do not want to index. Common examples would be and, but, are, and the. These are words that would make it through the other checks but are so common that indexing them in most circumstances would be useless.
The function also checks to see if the word exists in the $_allowwords class variable. Words in this array are words that would normally be pruned out, but we would like to include in the index. Examples would be c++, and vb.
If a word does not meet our criteria, we will use the unset() function to remove it from the array. This function destroys a given variable so that it is no longer set.
If the word does pass the tests, we strip any characters that are not alphanumeric, a literal single quote, or a dash. We then run the word through the addslashes() function to escape any single quotes.
<?php function _prune (&$item, $key, $array) { $item = strtolower ($item); if (((preg_match ("/[^a-z0-9'\?!-]/", $item)) || (strlen ($item) < 3) || (in_array($item, $this->_stopwords))) && (!in_array($item, $this->_allowwords))) {
unset($array[$key]); } else { $item = addslashes(preg_replace("/[^a-z0-9'-]/i", '', $item)); } } ?> |
Next: >>
More Database Articles Articles
More By Matt Wade