Querying XML: Use Cases - Use case SGML: Standard Generalized Markup Language. (Page 2 of 5 )
The example document and queries in this use case were first created for a 1992 conference on Standard Generalized Markup Language (SGML). For your use, the Document Type Definition (DTD) and example document are translated from SGML to XML.
This chapter does not implement these queries because they are not significantly different from queries in other use cases.
Use case “TEXT”: full-text search.
This use case is based on company profiles and a set of news documents that contain data for PR, mergers, and acquisitions. Given a company, the use case illustrates several different queries for searching text in news documents and different ways of providing query results by matching the information from the company profile and news content.
In this use case, searches for company names are interpreted as word-based. The words in a company name may be in any case and separated by any kind of whitespace.
All queries can be expressed in XSLT 1.0. However, doing so can result in the need for a lot of text-search machinery. For example, the most difficult queries require a mechanism for testing the existence of any member of a set of text values in another string. Furthermore, many queries require testing of text subunits, such as sentence boundaries.
Based on techniques covered in Chapter 1, it should be clear that these problems have solutions in XSLT. However, if you will do a lot of text querying in XSLT, you will need a generic library of text-search utilities. Developing generic libraries is the focus of Chapter 14, which will revisit some of the most complex full-text queries. For now, you will solve two of the most straightforward text-search problems in the W3C document. This chapter lists the others to give a sense of why these queries can be challenging for XSLT 1.0. The difficult parts are emphasized.
Question 1. Find all news items in which the name "Foobar Corporation" appears in the
title:
<xsl:template match="news">
<result>
<xsl:copy-of select="news_item/title[contains(., 'Foobar Corporation')]"/>
</result>
</xsl:template>
Question 2. For each news item that is relevant to the Gorilla Corporation, create an
"item summary" element. The content of the item summary is the title, date, and first
paragraph of the news item, separated by periods. A news item is relevant if the name
of the company is mentioned anywhere within the content of the news item:
<xsl:template match="news">
<result>
<xsl:for-each select="news_item[contains(content,'Gorilla Corporation')]">
<item_summary>
<xsl:value-of select="normalize-space(title)"/>. <xsl:text/>
<xsl:value-of select="normalize-space(date)"/>. <xsl:text/>
<xsl:value-of select="normalize-space(content/par[1])"/>
</item_summary>
</xsl:for-each>
</result>
</xsl:template>
Next: Use case PARTS: recursive parts explosion. >>
More XML Tutorials Articles
More By O'Reilly Media
|
This article is excerpted from chapter nine of the XSLT Cookbook, Second Edition, written by Sal Mangano (O'Reilly; ISBN: 0596009747). Copyright © 2007 O'Reilly Media, Inc. Check it out today at your favorite bookstore. Buy this book now.
|
|