PEAR Articles

  Home arrow PEAR Articles arrow Page 3 - Managing robots.txt using PHP: Generat...
PEAR ARTICLES

Managing robots.txt using PHP: Generating Dynamic Syntax
By: Codex-M
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 4
    2010-07-14

    Table of Contents:
  • Managing robots.txt using PHP: Generating Dynamic Syntax
  • Robots.txt using PHP example
  • Creating the PHP file and the static syntax
  • Upload the complete myrobots.txt.php to the website's root directory
  • Revise .htaccess to rewrite myrobots.txt.php to robots.txt

  •  
     

    SEARCH CODEWALKERS

    TOOLS YOU CAN USE

    advertisement

    Managing robots.txt using PHP: Generating Dynamic Syntax - Creating the PHP file and the static syntax


    (Page 3 of 5 )

    Let's name the file myrobots.txt.php (you can name it anything you like). The following is the initial syntax, taken from the existing robots.txt shown earlier -- except for the year syntax, which needs special processing:

    <?php

    header("Content-Type: text/plain");

    $currentsyntax='User-agent: *

    Disallow: */trackback

    Disallow: */feed

    Disallow: /searchresultpages

    Disallow: /wp-

    Disallow: /*?

    Disallow: /xmlrpc.php

    Disallow: /blockedbyrobots.php

    Allow: /wp-content/uploads/scripts/PHP-Server-Array-Variables.php

    Disallow: /postpdfcreator.php

    Disallow: /search/

    Disallow: /search

    Disallow: /*.js$

    Disallow: /antibot.php

    Disallow: /ajaxwebform/captcha.php

    Disallow: /*.jpg$

    Disallow: /ajaxwebform/ajaxvalidate.php

    Disallow: /hiddentextexample.php

    Disallow: /searchresultpages/

    Allow: /index.php?page_id=123&pg=2

    Allow: /site-map/?pg=2';

    echo $currentsyntax;

    //put step 2 php code here

    This script assigns the existing robots.txt syntax (that does not require dynamic editing) to a PHP variable, $currentsyntax. Then it echoes it to a browser as a text file.

    This script is not yet complete, as it does not yet block the problematic "year" directory.

    Dynamically generate syntax to block the "Year" directory

    In WordPress, you can use PHP to query the MySQL database to retrieve post dates in wp_post table. You can then do string manipulation to extract the year. If you are not using WordPress, you can do the same thing by following techniques similar to those discussed in this tutorial.

    Once the year has been extracted, it will then be concatenated with the robots.txt Disallow command. For an explanation of the code below, refer to the comments tags in bold fonts:

    //connect to Wordpress MySQL database

    $username = "xxxxxx";

    $password = "xxxxxx";

    $hostname = "xxxxxx";

    $database = "xxxxxxx";

    $dbhandle = mysql_connect($hostname, $username, $password)

    or die("Unable to connect to MySQL");

    $selected = mysql_select_db($database,$dbhandle)

    or die("Could not select $database");

    //Retrieve latest post dates in wp_post table post_date column

    $latestpostdate = mysql_query("SELECT max(post_date) from `wp_posts`") or die(mysql_error());

    $rowlatestpostdate = mysql_fetch_array($latestpostdate) or die("Invalid query" . mysql_error());

    $latestpostdatedata = $rowlatestpostdate['max(post_date)'];

    //Retrive the first post date in wp_post table post_date column

    $firstpostdate = mysql_query("SELECT min(post_date) from `wp_posts`") or die(mysql_error());

    $rowfirstpostdate = mysql_fetch_array($firstpostdate) or die("Invalid query" . mysql_error());

    $firstpostdatedata = $rowfirstpostdate['min(post_date)'];

    //Do PHP string manipulation to extract the post year of both first and latest post

    $firstdashlatest = strpos($latestpostdatedata,"-");

    $countlatestyear=strlen($latestpostdatedata);

    $substrcountlatest= ($countlatestyear-$firstdashlatest)*(-1);

    $latestpostyear = substr($latestpostdatedata,0,$substrcountlatest);

    $firstdashfirst = strpos($firstpostdatedata,"-");

    $countfirstyear=strlen($firstpostdatedata);

    $substrcountfirst= ($countfirstyear-$firstdashfirst)*(-1);

    $firstpostyear = substr($firstpostdatedata,0,$substrcountfirst);

    //Initialized the WHILE DO LOOP and assign the first year as the initial value.

    //Do the loop until the latest year has been reached.

    $i=$firstpostyear;

    echo strip_tags(nl2br("rn# Start of dynamically generated robots.txt syntax"));

    while ($i<=$latestpostyear) {

    echo strip_tags(nl2br("rnDisallow: /".$i++."/"));

    }

    echo strip_tags(nl2br("rn# End of dynamically generated robots.txt syntax"));

    Since the contents will be rendered in the text file at the browser, it is important to use strip_tags(nl2br("Contents to render in the browser...")) to add break lines (which makes your robots.txt syntax looks clean and readable in the browser) and strip HTML tags from displaying in the text output (e.g <br />)

    Finally, you can add the last two remaining pieces of PHP code for the sitemap reference in the robots.txt file:

    echo strip_tags(nl2br("rn"));

    echo strip_tags(nl2br("rnSitemap: http://www.php-developer.org/sitemap.xml"));

    To add space or line breaks (the same function as <br /> in HTML output) to the robots.txt file, this line is used:

    echo strip_tags(nl2br("rn"));

    The most important line in the PHP script mentioned above is this:

    echo strip_tags(nl2br("rnDisallow: /".$i++."/"));

    This will generate the actual robots.txt "Disallow" syntax for the year directory. So if your WordPress-based site has been in existence since 2005 and you've been updating its content through the present (2010), the generated syntax will be:

    Disallow: /2005/

    Disallow: /2006/

    Disallow: /2007/

    Disallow: /2008/

    Disallow: /2009/

    Disallow: /2010/

    Warning: Do not forget to back up any existing files (robots.txt, .htaccess, etc) before doing any editing and uploading that can overwrite existing files.

    More PEAR Articles Articles
    More By Codex-M

    blog comments powered by Disqus

    PEAR ARTICLES ARTICLES

    - Installing PEAR
    - PEAR: an Introduction
    - Managing robots.txt using PHP: Generating Dy...
    - Deleting Authors from a PEAR Content Managem...
    - PEAR CMS: Index and Delete Scripts
    - Listing Articles for a PEAR Content Manageme...
    - Building an Authors Page for a PEAR CMS
    - Building the View Details Page in a PEAR CMS
    - Creating the Main Pages of a PEAR CMS
    - Completing the Login Script for a PEAR CMS
    - User Authentication for a PEAR CMS
    - A PEAR CMS: Examining the Code
    - Building a Content Management System with PE...
    - Installing a PEAR Package
    - My PEAR: The Beginning


    © 2003-2012 by Developer Shed. All rights reserved. DS Cluster 8 - Follow our Sitemap