Intro to URL Rewriting with Apache’s .htaccess

02/28/2008

I have created an .htaccess file to do URL rewriting for every site I’ve ever created. If you’re not familiar with URL rewriting, it is used to modify a URL or redirect the user before the requested resource is fetched. One of its major uses is to make URLs human readable. That means your users can visit a pretty URL like http://www.mystore.com/shoes/ and have it interpreted by the server as http://www.mystore.com/shop.php?category=shoes.

Most of the time, this file can be relatively simple. I would always recommend using one for URL canonicalization, which is a fancy term for making sure you have one unique URL for each page. For example, lumidant.com redirects to www.lumidant.com. This is beneficial for SEO because you want to ensure that search engines don’t split your ranking points between pages that are actually one and the same.

The code below is the .htaccess file from this site. The declarations in the file are regular expressions, which you might need to get a quick refresher on if you’re not familiar with. A few other things to be aware of include the fact that [NC] stands for no case and means that the text is not case-sensitive, [R=301] tells the server to do a 301 redirect, and [L] tells the server it can quit there and and not bother processing the rest of the file.

<IfModule mod_rewrite.c>

  RewriteEngine on

  # rewrite all lumidant.com requests to the lumidant subdirectory
  RewriteCond %{HTTP_HOST} ^(www\.)?lumidant\.com$
  # this is needed to stop infinite looping
  RewriteCond %{REQUEST_URI} !^/lumidant/.*$
  # don't redirect these directories to the lumidant subdirectory
  RewriteCond %{REQUEST_URI} !^/pinknews/.*$
  RewriteRule ^(.*)$ /lumidant/$1

  # if you're asking for a directory and there is no trailing slash then add one
  RewriteCond %{REQUEST_FILENAME} -d
  RewriteCond %{REQUEST_URI} !^.*/$
  RewriteRule ^/lumidant/(.*)$ http://www\.lumidant\.com%{REQUEST_URI}/ [R=301,L]

  # add a www if there's not one
  RewriteCond %{HTTP_HOST} ^lumidant\.com$ [NC]
  RewriteCond %{REQUEST_URI} !^/blog.*$
  RewriteRule ^lumidant/(.*)$ http://www\.lumidant\.com/$1 [R=301,L]

</IfModule>

This blog is currently hosted with BlueHost. For accounts with multiple domains, BlueHost places the add-on domains in subdirectories of the main domain. This can be confusing to maintain, so I moved all of the lumidant code to a subdirectory as well and then updated the .htaccess file to make this organization transparent to the end user.

The last few lines add a www to all non-www pages. While I could have placed this at the beginning of the file, the file would be executed again after the redirect causing possibly another redirect to be executed if a trailing slash needed to be added. Keep in mind while organizing the file that you’d like to minimize the number of redirects for many reasons including response times, reducing server load, and optimizing for search engines.

URL rewriting can be tricky at first, especially if you’re not familiar with regular expressions. If you’re working with redirections, then it may help to check the HTTP headers of your request to see what intermediate redirects are occurring.

Finally, if you’re not using Apache there are other alternatives to .htaccess. For example, I have used the UrlRewriteFilter in the past for Java web apps.

Be Sociable, Share!