404 Error Pages and Redirects for SEOs

Introduction

A 404 error means “not found”. This is usually the page you get when you make a mistake spelling page name in a site, or if the page is deleted or moved. The problem is that the standard 404 page is ugly and unhelpful.

Many people have figured out that if you use a custom 404 page you can present a much more helpful page to you visitors. Others have taken it a step further and made that custom page a redirect to the home page, so that any links (and PR) pointing to pages that have been deleted (or misspelled) will be passed on to the website.

Sounds great, right? Well, there is a problem (there is almost always a problem with things that sound too good to be true…). The problem is that if you use a redirect to pass PR from an error page to a normal page, the redirecting page will usually return a “200 OK” or 302 Redirect code, rather than a proper 404. This messes up search engines and can result in a whole bunch of indexed URL’s all looking to the search engine like duplicates of your home page (there is no redirect code, it’s a pure 200 OK).

This usually isn’t necessary, but can be useful if you are trying to remove all traces of a page you no longer want associated with your site (i.e. one you were sued over, for example). It says that the page is missing on purpose, and is not an accident or temporary problem.

In this case a URL removal Request to Google followed by a 410 on the page location itself should do it. You can also use robots.txt and robots metatag as backup.

This is bad for your site. Additionally, there are a LOT of indexed “error” pages in search engines (especially Yahoo) that should not be there.

The proper behaviour for an error page is to return a 404 error code. The best result for your visitors is an error page that is either helpful by itself or redirects to a helpful page. The best result from an SEO viewpoint is for any link popularity for broken links be passed on to the page of your choice.

Naturally, the best result overall would be something that accomplishes all of the above. Unfortunately, this is not directly possible. As soon as the search engine is sent the error code, it treats it as a dead page and will eventually remove it.

PR and link weight are only passed on if a page is not a 404. But your site logs will not report errors if it responds as a 200, and your site will not verify (for example, if you use Google Sitemaps) if you don’t have a valid 404 page.

There are 4 possible scenarios with custom pages:

  • 404 – Responds with an error, but shows a custom page to help your visitors
  • 200 – If a page is missing, it’s replaced with the custom error page
  • 302 – If the page is missing, it’s replaced with a temporary redirect to a custom error page
  • 301 – Redirects errors to either a custom error page, or some other page in the site (i.e. sitemap, homepage or best guess)

Each has benefits and drawbacks. You have to choose – “Red or Blue”:

Custom Error Page Types

404 Not Found Response 200 OK (or 302, or 301) Response
Properly Defines the result – a missing page. Tricks the search engine into thinking all is well.
Validates. Does not validate, but won’t break your site.
Shows up in logs so you can fix it. Does not show up as an error – harder to find.
Does not pass on PR or link weight. Passes on PR to final page.
No duplication issues. Can result in a duplication penalty.

Custom Error Page Link Issue

One thing I’d like to make sure everyone is aware of – a custom error page can be called anywhere in your site. This means that if you put any links on that page to help the visitor find their way, you cannot make them relative – since you don’t know where they are relative to.

You must make them either absolute (recommended) or set the base HREF using this code in the header of the page and make sure all your links are relative to it:

<base href=”http://www.yoursite.com”>

Nifty Misuse of the Error Page

Sometimes you won’t have access to the .htaccess of a site, but do have access to a custom error page. Let’s say you have a dynamic site on this site but due to security issues (i.e. PHP “safe” mode) you can’t write pages dynamically to disk, and therefore, without .htaccess to do it on the fly or php permissions to write static pages, you can’t have a CMS with “SEO friendly” URLS. Or can you?

Normally, I’d suggest switching hosts in this case. Really. But let’s say you want to stay with them.

You can write a script on your custom error page to parse what the requested URL is: ie yourdomain.com/content/blue.htm into a database query that is actually yourdomain.com?content=blue and then put the results of that query into the error page thus “faking” .htaccess.

In reality, you are using the .htaccess, but just not in the way it was intended. Naturally, this technique is not standard and your mileage may vary depending on the server setup. It also results in a 200 OK. Make sure that you program in error capturing so if someone legitimately types in the wrong URL that it results in a 404.

Naturally, this also works with IIS and an ASP error page, as well.

Server Issues

Apache and IIS handle custom error pages differently. Usually, I’ve noticed that the custom error pages on IIS are more likely to be wrong than the ones on Apache, but they can both have issues.

First things first – you need to issue a 404 error code at the server level in order for it to work consistently. Attempting to write it in at the page level will not work. If the page is dynamic and the error is written at the server level before the page is served, that will usually work. Once you are at the ISAPI level, it’s too late to send an error code.

Important!

Normally, it’s a good idea to define pages in a server as absolute URLS or files. Not for custom error pages. The path MUST BE RELATIVE TO ROOT or it will return a 200 OK. This applies to all servers I’m aware of, including both Apache and IIS.

Of course, if you are trying to get a 200 OK status in an attempt to pass on PR, then you would use the full URL, not the relative one.

Apache Custom 404

This one is easy. Just go to your.htaccess file (or control panel) and type in the following:

ErrorDocument 404 /404.php

Change the name “404.php” to whatever the name for your custom error page is.

You might be tempted to type in:

ErrorDocument 404 http://www.mysite.com/404.php     *WRONG! Results in 200 OK*

But it won’t work. Usually it will result in a 200 OK response. Once again, the path must be relative to root or it won’t respond with a 404 error code properly.

IIS Custom 404

Dynamic Error Page

If you are running IIS and you are using .asp or aspx custom error pages, then you can put:

Response.status = “404 Not Found”

In code, this usually looks like:

<%
dim pageRequested
with request
pageRequested = _
mid(.queryString, instr(.queryString,”;”) + 1)
end with
response.status = “404 Not Found”
%>

Put this at the very top of the page. Then, create the rest of the page to do and say what you want it to.

Static Error Page

There is nothing special you need to do to static error pages, just make sure you connect to them using the full file name (i.e. c:/www/404.htm).

Setting the Custom Error Page in IIS

Go to IIS Administration and choose the web that you want to set the custom error page for (each web may have it’s own). Right click and go to the “Custom Errors” tab.

Custom 404 settings in IIS

For dynamic error pages, make sure that you are pointing to the custom page using the URL not the File choice. If you use File, it will not pre-process the page, and will simply treat it as static. Remember to use the RELATIVE PATH FROM HOME or it will return a 200 OK instead of the 404 Not Found.

If you are using a static page (i.e. .htm) then you use the File choice to connect to it, using the physical drive location (i.e. c:/www/404.htm)

You can then test it using a Header Viewer: If it comes back in the Header section as:

HTTP/1.1·200·OK(CR)(LF)

Then it’s not working, but if it comes back and shows:

HTTP/1.1·404·Object·Not·Found(CR)(LF)

Then it is.

The Metarefresh Problem

In some cases, people will create a custom error page that displays the error, then “helpfully” uses a metarefresh to forward the visitor to the site map, home page or some other page.

The problem with this is that each search engine treats metarefreshes differently. Yahoo, for example, treats a metarefresh of 0 as a 301, and anything larger as a 302. Most of the time, this works great. But in the case of a 404 error page with a metarefresh- what is it being treated as?

The other search engines vary widely in how they handle these. I believe that since the URL is sent to the search engine by the server as 404;http://www.mysite.com it would normally attempt to treat the page as a 404, and not look at the metarefresh, but I’m not certain if that’s the case, since the metarefresh overrides the initial 200 OK for other pages in order to create the effect of a 302. There is no standardized method of dealing with this from a search engine perspective.

Bottom line, don’t use a metarefresh on an error page. If the page used to exist but is somewhere else, then that’s a legitimate redirect, not an error.

I recommend avoiding metarefreshes on 404 error pages if you are hoping for 404 behaviour (i.e. avoiding duplication issues).

You could use a javascript refresh/forward with no issues, however, since search engines do not execute those.

Holy Grail: Best Practice for Capturing PR and Still Validating

In general, you want a custom error page to respond with a 404 Not Found. However, if you have a lot of broken incoming links to non-existent pages, you may be tempted to capture the PR for them by setting up a custom error page  that does not respond with a 404 or full fledged redirect.

The problem is that this can result in a duplication error, and will mess up validation of your site. There is another option.

What you can do is set up a custom 404 Error page that returns a proper 404 code, then watch your error logs. If you see visits to a bad page, either create a copy of that bad page and 301 it to your home or some other page, or use .htaccess to 301 those specific page calls.

This way, genuine on-the-fly misspellings are sent to an error page, but existing broken links to your site are redirected using a 301 and therefore the PR is passed on to your site. Win both ways 🙂

Conclusion

It’s very common for people to use redirects while attempting to deal with error pages and broken links. Hopefully this has provided some guidance on how to deal with this properly.

End of Redirects for SEO’s Series


Main Article

Detailed Technical Information

Specific Scenarios and How To Deal With Them

Unless otherwise noted, all articles written by Ian McAnerin, BASc, LLB. Copyright © 2002-2004 All Rights Reserved. Permission must be specifically granted in writing for use or reprinting anywhere but on this site, but we do allow it and don’t charge for it, other than a backlink. Contact Us for more information.