Searching with Strings - 5.15 Using FULLTEXT Searches
(Page 2 of 4 )
Problem
You want to search through a lot of text.
Solution
Use a FULLTEXT index.
Discussion
You can use pattern matches to look through any number of rows, but as the amount of text goes up, the match operation can become quite slow. It’s also common to look for the same text in several string columns, which with pattern matching tends to result in unwieldy queries:
SELECT * from tbl_name WHERE col1 LIKE 'pat' OR col2 LIKE 'pat' OR col3 LIKE 'pat' ...
A useful alternative is FULLTEXT searching, which is designed for looking through large amounts of text and can search multiple columns simultaneously. To use this capability, add a FULLTEXT index to your table, and then use the MATCH operator to look for strings in the indexed column or columns. FULLTEXT indexing can be used with MyISAM tables for nonbinary string data types (CHAR, VARCHAR, or TEXT).
FULLTEXT searching is best illustrated with a reasonably good-sized body of text. If you don’t have a sample dataset, several repositories of freely available electronic text are available on the Internet. For the examples here, the one I’ve chosen is the complete text of the King James Version of the Bible (KJV), which is relatively large and has the useful property of being nicely structured by book, chapter, and verse. Because of its size, this dataset is not included with the recipes distribution, but is available separately as the mcb-kjv distribution at the MySQL Cookbook web site (see Appendix A). The mcb-kvj distribution includes a file kjv.txt that contains the verse records. Some sample records look like this:
O Genesis 1 1 1 In the beginning God created the heaven and the earth. O Exodus 2 20 13 Thou shalt not kill. N Luke 42 17 32 Remember Lot's wife.
Each record contains the following fields:
Book section. This is either O or N, signifying the Old or New Testament.
Book name and corresponding book number, from 1 to 66.
Chapter and verse numbers.
Text of the verse.
To import the records into MySQL, create a table named kjv that looks like this:
CREATE TABLE kjv ( bsect ENUM('O','N') NOT NULL, # book section (testament) bname VARCHAR(20) NOT NULL, # book name bnum TINYINT UNSIGNED NOT NULL, # book number cnum TINYINT UNSIGNED NOT NULL, # chapter number vnum TINYINT UNSIGNED NOT NULL, # verse number vtext TEXT NOT NULL # text of verse ) ENGINE = MyISAM;
Then load the kjv.txt file into the table using this statement:
mysql> LOAD DATA LOCAL INFILE 'kjv.txt' INTO TABLE kjv;
You’ll notice that the kjv table contains columns both for book names (Genesis, Exodus, ...) and for book numbers (1, 2, ...). The names and numbers have a fixed correspondence, and one can be derived from the other—a redundancy that means the table is not in normal form. It’s possible to eliminate the redundancy by storing just the book numbers (which take less space than the names), and then producing the names when necessary in query results by joining the numbers to a small mapping table that associates each book number with the corresponding name. But I want to avoid using joins at this point. Thus, the table includes book names so that search results can be interpreted more easily, and numbers so that the results can be sorted easily into book order.
After populating the table, prepare it for use in FULLTEXT searching by adding a FULLTEXT index. This can be done using an ALTER TABLE statement: