RSS 2.0 - The Basic Structure
(Page 2 of 4 )
The top level of an RSS 2.0 document is the
rss version="2.0" element. This is followed by a single channel element. The channel element contains the entire feed contents and all associated metadata.
Required Channel Subelements
There are 3 required and 16 optional subelements of channel within RSS 2.0. Here are the required subelements:
title
The name of the feed. In most cases, this is the same name as the associated web site or service.
<title>RSS and Atom</title>
link
A URL pointing to the associated resource, usually a web site. The link must be an IANA-registered URI scheme, such as http://, https://, news://, or ftp://, though it isn't necessary for a application developer to support all these by default. The most common by a large margin is http://. For example:
<link>http://www.benhammersley.com </link>
description
Some words to describe your channel.
<description>This is a nice RSS 2.0 feed of an even nicer weblog</description>
Although it isn't explicitly stated in the specification, it is highly recommended that you do not put anything other than plain text in the channel/title or
channel/description elements. There are some existing feeds with HTML within those elements, but these cause a considerable amount of wailing, and at least a small amount of gnashing of teeth. Do not do it. Use plain text only in these elements. The following sidebar, "Including HTML Within title or description," gives a fuller account of this, but in my opinion it's a bad idea.
Optional Channel Subelements There are 16 optional channel subelements of RSS 2.0. Technically speaking, you can leave these out altogether. However, I encourage you to add as many as you can. Much of this stuff is static; the content of the element never changes. Placing it into your RSS template or adding another line to a script is little work for the additional value of your feed's metadata. This is especially true for the first three subelements listed here:
Including HTML Within title or description
Since the early days of RSS 0.91, there's been an ongoing debate about whether the item/title or item/description elements may, or should, contain HTML. In my opinion, they should not, for both practical and philosophical reasons. Practically speaking, including HTML markup requires the client software to be able to parse or filter it. While this is fine with many desktop agents, it restricts developers looking for other uses of the data. This brings us to the philosophical aspect. RSS's second use, after providing headlines and content to desktop readers and sites, is to provide indexable metadata. By combining presentation and content (i.e., by including HTML markup within the description element), you could disable this feature.
However, my opinion lost out on this one. RSS 2.0 now allows for entity-encoded HTML within the item/description tag. It doesn't mention anything, in either direction, regarding item/title, and people are basically making it up as they go along. With that in mind, I still state that item/title at least should be considered plain text.
If you want to put HTML within the item/description element, you can do it in two ways:
Entity encoding
With entity encoding, the angle brackets of HTML tags are converted to their respective HTML entities, < and >. If you need to show angle brackets as literal characters, the ampersand character itself should be encoded as well:
This is a <em>lovely left angle bracket:</em> &lt;
Within a CDATA block
The alternative is to enclose the HTML within a CDATA block. This removes one level of entity encoding, as in:
<![CDATA[This is a <em>lovely left angle bracket:</em> <]]>
Either approach is acceptable according to the specification, and there is no way for a program to tell the difference between the two, or to tell if the description is actually just plain text that resembles encoded HTML. This is a major problem with the RSS 2.0 specification, as you'll see when we talk about parsing feeds. Atom and RSS 1.0 both have their own ways around this issue.
language
The language the feed is written in. This allows aggregators to index feeds by language and should contain the standard Internet language codes as per RFC 1766.
<language>en-US</language>
copyright
A copyright notice for the content in the feed:
<copyright>Copyright 2004 Ben Hammersley</copyright>
managingEditor
The email address of the person to contact for editorial enquiries. It should be in the format:
name @example.com (FirstName LastName).
<managingEditor>ben@benhammersley.com (Ben Hammersley)</managingEditor>
webMaster
The email address of the person responsible for technical issues with the feed:
<webMaster>techsupport@benhammersley
.com (Geek McNerdy)</webMaster>
pubDate
The publication date of the content within the feed. For example, a daily morning newspaper publishes at a certain time early every morning. Technically, any information in the feed should not be displayed until after the publication date, so you can set pubDate to a time in the future and expect that the feed won't be displayed until after that time. Few existing RSS readers take any notice of this element in this way, however. Nevertheless, it should be in the format outlined in RFC 822:
<pubDate>Sun, 12 Sep 2004 19:00:40 GMT</pubDate>
lastBuildDate
The date and time, RFC 822--style, when the feed last changed. Note the difference between this and channel/pubDate. lastBuildDate must be in the past. It is this element that feed applications should take as the "last time updated" value and not channel/pubDate.
<pubDate>Sun, 12 Sep 2004 19:01:55 GMT</pubDate>
category
Identical in syntax to the item/category element you'll see later. This takes one optional attribute, domain. The value of category should be a forward-slash-separated string that identifies a hierarchical location in a taxonomy represented by the domain attribute. Sadly, there is no consensus either within the specification or in the real world as to any standard format for the domain attribute. It would seem most sensible to restrict it to a URL; however, it needn't necessarily be so.
<category domain="Syndic8">1765</category>
generator
This should contain a string indicating which program created the RSS file:
<generator>Movable Type v3.1b3</generator>
docs
A URL that points to an explanation of the standard for future reference. This should point to http://blogs.law.harvard.edu/tech/rss:
<docs>http://blogs.law.harvard.edu/ tech/rss</docs>
cloud
The <cloud/> element enables a rarely used feature known as "Publish and Subscribe," which we shall investigate fully in Chapter 9. It takes no value itself, but it has five mandatory attributes, themselves also explained in Chapter 9: domain, path, port, registerProcedure, and protocol.
<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure= "pingMe"
protocol="soap"/>
ttl
ttl, short for Time-to-Live, should contain a number, which is the minimum number of minutes the reader should wait before refreshing the feed from its source. Feed authors should adjust this figure to reflect the time between updates and the number of times they wish their feed to be requested, versus how up to date they need their consumers to be.
<ttl>60</ttl>
image
This describes a feed's accompanying image. It's optional, but many aggregators look prettier if you include one. It has three required and two optional subelements of its own:
url
The URL of a GIF, JPG, or PNG image that corresponds to the feed. It is, quite obviously, required.
title
A description of the image, normally used within the ALT attribute of HTML's <img> tag. It is required.
link
The URL to which the image should be linked. This is usually the same as the channel/link.
width and height
The width and height of the icon, in pixels. The icons should be a maximum of 144 pixels wide by 400 pixels high. The emergent standard is 88 pixels wide by 31 pixels high. Both elements are optional.
<image> <title>RSS2.0 Example</title> <url>http://www.exampleurl.com/example/ images/logo.gif</url> <link>http://www.exampleurl.com/example/ index.html</link>
<width>88</width> <height>31</height> <description>The Worlds Leading Technical Publisher</description> </image>
rating
The PICS rating for the feed; it helps parents and teachers control what children access on the Internet. More information on PICS can be found at http://www. w3.org/PICS/. This labeling scheme is little used at present, but an example of a PICS rating would be:
<rating>(PICS-1.1 "http://www.gcf.org/v2.5" labels on "1994.11.05T08:15-0500"
until 1995.12.31T23:59-0000" for http://w3.org/PICS/Overview.html ratings
(suds 0.5 density 0 color/hue 1))</rating>
textInput
An element that lets RSS feeds display a small text box and Submit button, and associates them with a CGI application. Many RSS parsers support this feature, and many sites use it to offer archive searching or email newsletter sign-ups, for example. textInput has four required subelements:
title
The label for the Submit button. It can have a maximum of 100 characters.
description
Text to explain what the textInput actually does. It can have a maximum of 500 characters.
name
The name of the text object that is passed to the CGI script. It can have a maximum of 20 characters.
link
The URL of the CGI script.
<textInput> <title>Search</title> <description>Search the Archives</
description> <name>query</name> <link>http://www.exampleurl.com/example/
search.cgi</link> </textInput>
skipDays and skipHours
A set of elements that can control when a feed user reads the feed. skipDays can contain up to seven day subelements: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday. skipHours contains up to 24 hour subelements, the numbers 1--24, representing the time in Greenwich Mean Time (GMT). The client should not retrieve the feed during any day or hour listed within these two elements. The elements are ORed not ANDed: in the example here, the application is instructed not to request the feed during 8 p.m. on any day, and never on a Monday:
<skipDays><day>Monday</day> </skipDays><skipHours><hour>20</hour></skipHours>
Next: item Elements >>
More XML Tutorials Articles
More By O'Reilly Media
|
This article is excerpted from chapter four of the book Developing Feeds with RSS and Atom, written by Ben Hammersley (O'Reilly; ISBN: 0596008813). Check it out today at your favorite bookstore. Buy this book now.
|
|