Creating a Search Application - Creating a Search Application
(Page 8 of 29 )
By this point, you should have a general understanding of how databases work. This overview of databases was necessary because we will be using them extensively in our search application. You may be wondering why we would use a database rather than a flat file to store our data. The reason is that we will be storing quite a bit of data and will need to perform regular searches on that data. The overhead involved to accomplish this with flat files would overly complicate our search application and likely slow it down.
Now, let's move on to the search application itself and talk a little bit about how it will work. We will continue building this application using classes as we have started to do with the database class. By using classes, we are able to move the logic of the searching out of the main script and provide an interface that anyone can use regardless of whether they know they internal workings of the searching class or not. This allows someone implementing a search using this search application to worry about displaying the results rather than how they were obtained.
Our search application will have two main scripts. The first, called harvest.php, will allow us to specify URLs that we would like to collect keywords from to build our index. In a nutshell, we will grab the source for the supplied URLs and break it into individual words. Each of those words will then be stored in the database along with a reference to what URL it was found on. If a word is found multiple times on a single URL, we will store it multiple times. This will allow us to return search results that have a relevancy factor that correlates to how often the search terms appear on a page.
The second script in our search application will be named search.php. In this script we will provide a form where keywords can be entered for searching. The script will first try and provide exact keyword matching. If exact matching is not possible, it will attempt to find similar words in the database by utilizing the similar_text function of PHP.
As we discussed earlier in the tutorial, we have four tasks we need to accomplish in order to create a search application.
Create Database Tables - Set up the tables where we will store the keywords and URLs.
Harvest Keywords - Before we can search anything, we need to gather the keywords and store them. We will develop a basic script to harvest keywords from given pages.
Exact Keyword Search - The first portion of our search script will provide exact keyword matching. Pages with a greater number of matches will be returned first.
Fuzzy Searching - If the exact keyword search provides no results, we will then search for similar words and return pages that contain them.