21 September 2011

Magento - remove trailing slash from urls

For SEO reasons store owners will often want to standardise the URL generation in their Magento store. And rightly so, if your store is serving identical content at multiple urls then you could be penalised for having duplicate content on your site which will in turn negatively impact your search engine rankings.

To make sure you don't run into these problems with your Magento store, there are a few changes that are needed:

- Modify the method that creates URLs
- Fix some flawed Magento core logic
- Create an htaccess rewrite rule to remove manually added slashes

Firstly lets modify the getUrl() method to remove any trailing slashes in generated URLs. Copy the file app/code/core/Mage/Core/Block/Abstract.php to app/code/local/Mage/Core/Block/Abstract.php if it's not there already.

Open app/code/local/Mage/Core/Block/Abstract.php and find the getUrl() method, it should just be one line of code that returns the requested URL:
return $this->_getUrlModel()->getUrl($route, $params);
Replace that line with the following:
$return_url = $this->_getUrlModel()->getUrl($route, $params);
if ($return_url != $this->getBaseUrl() && substr($return_url, -1) == '/' && !Mage::getSingleton('admin/session')->isLoggedIn()):
    return substr($return_url, 0, -1);
else:
    return $return_url;
endif;
The above is pretty straignt forward stuff really, firstly we store the URL that would normally get returned from the method into the variable $return_url. We then run some tests to make sure firstly, the url we have is not for the homepage (in which case we do not want to remove the trailing slash), secondly it has a trailing slash as the last character, and thirdly we are not currently logged in to admin. When all three of these conditions are satisfied, we can remove the trailing slash from the end of the URL.

Now in theory that should be all we need to do, however there is, what I can only assume is some failed logic in another Magento core file that means this approach fails in some cases. To correct this, copy the file app/code/core/Mage/Core/Model/Url.php to app/code/local/Mage/Core/Model/Url.php if it's not there already.

Open app/code/local/Mage/Core/Model/Url.php and find the following:
if ($noSid !== true) {
    $this->_prepareSessionUrl($url);
}
A little further up the file the variable $noSid is initialised:
$noSid = null;
It is then set to either boolean true/false if certain conditions are met. If the conditions are not met, $noSid remains as null, the if statement is satisfied and the session id is appended to the URL (even though it should not be). In this case, what is actually returned is a session id of simply the letter U. The addition of this to the end of the URL means the first change we made will fail because the final character is not a slash.

So with the if statement above it seems logical that we only want to append the session id to the URL if $noSid is identically equal to false, thus ruling out null as a pass condition. So go ahead and change the if statement from:
if ($noSid !== true) {
to
if ($noSid === false) {
With this change, the session id will no longer be incorrectly added onto the URL and the first change we made to remove the trailing slash will succeed.

This should now remove all trailing slashes from generated URLs throughout the store, but a trailing slash can still be manually added to serve the same content. To cater for this scenario, we want to create a 301 redirect from all trailing slash URLs to a non trailing slash version of the page. We can do this in the Magento .htaccess file in the install root.

Open .htaccess and find the following rule:
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
Immediately after this add the following lines:
RewriteCond %{request_method} ^GET$
RewriteCond %{REQUEST_URI} !^/downloader.*$
RewriteCond %{REQUEST_URI} ^(.+)/$
RewriteRule ^(.+)$ %1 [L,R=301]
The redirect works in the following way. When requesting existing data from the server Magento will only ever use the GET method, and when sending new data to the server, POST will only ever be used (as is the standard). We never want to touch the POST requests (and indeed if you do you will find all kinds of problems like not being able to add to the basket, or even save anything in admin) so we restrict this rule to only ever affect GET requests - which is how pages will always be requested. The second line ensures the rule is not applied for Magento Connect Manager as this will break it. The third line ensures the rule is only ever applied to URLs with a trailing slash, and also stores all of the request URL up until that trailing slash into a back reference (the section in the brackets). With the conditions of the first three lines satisfied, the rule is then applied as a 301 redirect to the URL that has been stored in the RewriteCond backreference (so the url without the trailing slash). Note that if we wanted to redirect to the backreference in the RewriteRule we would use $1 instead of %1.

Integrating these changes will ensure your store never serves duplicate content caused by a trailing slash, and sets up a 301 redirect for any duplicate content already found by search engines.

29 comments:

  1. This works perfectly... Many thanks. Note to all, add rewrite code as follows:

    ############################################
    ## workaround for HTTP authorization
    ## in CGI environment

    RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

    RewriteCond %{request_method} ^GET$
    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteRule ^(.+)$ %1 [L,R=301]

    ############################################
    ## always send 404 on missing files in these folders

    RewriteCond %{REQUEST_URI} !^/(media|skin|js)/

    ReplyDelete
  2. This works great except for markup tags. For example:

    {{store url='some-page'}}

    still returns a URL with a trailing slash. Any idea how to fix that?

    ReplyDelete
    Replies
    1. The solution for markup tags is to place the page outside the tag:

      {{store url=''}}some-page

      But, I've noticed another issue. The standard top links:

      My Account | My Wishlist | My Cart | Checkout | Log In

      still add a slash to the end of the URL. Ideas?

      Delete
  3. These URL's are pulled from layout files and ultimately the database so they don't pass through the same methods as other URL's. You will need to look at the relevant layout file and trace it back from there.

    ReplyDelete
    Replies
    1. Thanks for the quick reply and for your help. There aren't too many of them. I'll probably just add them under URL Rewrite Management in the backend.

      Delete
  4. This works, but I cannot access the downloader page. Do you have any ideas?

    ReplyDelete
    Replies
    1. Would I be right thinking you had access before you made these changes?

      Delete
  5. Same problem here. When I try to access to Magento Connect Manager it goes here: downloader/?return=http%2525252525252525253A%2525252525252525252F%2525252525252525252Fwww.tomevinos.com%2525252525252525252Findex.php%2525252525252525252Fadmin%2525252525252525252F

    ReplyDelete
    Replies
    1. Ok, well I would ask the same question to you also, was this working correctly before making these changes?

      Note that the only thing the above code does in terms of altering the URL is to remove a single trailing slash only if one is found, no other part of the URL is affected. Note also that this method was developed under Magento 1.4.1.1, so it may not work under other versions.

      Delete
  6. Hello Hussey,

    Yes, It was working correctly before I did changes. My Magento version is 1.5 so maybe this is the problem.

    Thanks a lot and regards.

    ReplyDelete
    Replies
    1. I've updated the post now, please review the .htaccess section and add in the extra line to allow Magento Connect Manager to continue to work.

      Delete
  7. Thanks Hussey,

    It is working perfectly. Thanks a lot.

    Regards

    ReplyDelete
  8. GREAT solution. But something is still not working correctly on our Magento install.

    For example on the following page: http://www.yourmagentodomain.com/customer/account/login
    Here I am having different urls generated automatically from Magento - the "Forgot password?" URL is WITH a trailing slash.

    I did have a look at the theme files as Hussey did write on 22 February 2012 18:17.
    Withing the "login.phtml" I found this: getForgotPasswordUrl() ?>

    So Magento is somehow still creating URLs with trailing slashes automatically....

    Is just our install doing this?

    Greetings from Germany,
    Florian

    ReplyDelete
    Replies
    1. Hi Florian,

      I have traced the getForgotPasswordUrl() method through and it doesn't pass through Mage_Core_Block_Abstract, only the underlying class Mage_Core_Model_Url. You could potentially apply the same logic to strip off trailing slashes within the getUrl() method of this class, but I would not be keen to do this as being a fundamental model class that extends directly off Varien_Object, it is likely to be called in many more situations than just generating blocks for display.

      With any remaining URL's generated with slashes (this is the first one I have come across so far) I would recommend just stripping off the trailing slash in the template as I think editing the way URL's are generated in Mage_Core_Model_Url could certainly cause issues.

      Delete
  9. Hi Hussey,

    thanks for the explanation.

    I actually added the following line of code to the htaccess, preventing the URLs getting 301 redirected within the "Customer Account" area - since the URLs are NOINDEX anyways:

    RewriteCond %{REQUEST_URI} !^/customer.*$


    ReplyDelete
  10. Hi Marc,

    It sounds like your base URL is http://www.shop.com/ rather than http://shop.com/ in admin. Magento will redirect to the base URL address defined in admin if it differs from the browser address bar. Look at System -> Configuration -> Web -> Unsecure/Secure and make sure the base URL's are correct here.

    ReplyDelete
  11. Hi again Marc,

    I think what you are trying to do is better achieved using .htaccess rather than by reworking Magento's base url redirect. A simple rule such as the following should do what you need:

    RewriteCond %{HTTP_HOST} ^shop\.com$
    RewriteRule ^(.*)$ http://www.shop.com/$1 [R=301,L]

    I don't believe however that Magento's redirect to the base url should ever cause issues on the SEO front as unless they are added by someone to the site, a search engine should never find any http://shop.com URL's with the base URL being http://www.shop.com.

    ReplyDelete
  12. Thanks Hussey. But, you also need to exempt checkout pages in .htaccess rewrites. ../checkout/cart might still work, but ../checkout/onepage won't work as it references the directory. The rewrite is:

    RewriteCond %{REQUEST_URI} !^/checkout.*$

    ReplyDelete
    Replies
    1. Hi Dustin,

      Thanks for the info, were you having problems with the checkout process also being rewritten? Can you let me know what they were and I'll do some testing and update the post where needed?

      Delete
  13. Great article, Wanted to run this for our website. Tested on our dev server (using Xampp) and all great, went to implement on live but it seems our .htaccess is not being used? From what I can gather there is a .conf file on the server being used... any ideas how I would interpret the .htaccess code for the .conf? There is no HTTP AUTHORIZATION but there are some rewrites in a different format.

    ReplyDelete
    Replies
    1. When you say .htaccess is not being used, what do you mean exactly, are there symptoms being displayed which lead you to believe that? If .htaccess was not being used at all, the site would not run as it contains the rules which rewrite all requests to index.php.

      It sounds like you maybe have extra rewrite rules in your .htaccess so it could be the case that the rules from this solution are not actually running for instance if you are using the [L] flag in a previous rule.

      Delete
    2. I presumed this was the case as the changes I made to it do not get applied and after doing some Googling I (maybe wrongly) gathered that the .htaccess could be overwritten by using a conf file on the server.

      There are no other rewrites in the .htaccess and nothing else using the [L] flag either. Going to a link on the site and entering a slash on the end does not redirect.

      Delete
    3. You can certainly disallow certain things from being possible in .htaccess from the conf file. Check that you have the relevant AllowOverride rule in place for your Magento install directory. Most of the time 'AllowOverride All' is what you need.

      http://httpd.apache.org/docs/2.2/mod/core.html#allowoverride

      Delete
    4. I had a look at the file and couldn't see that so I thought I'd ask our server hosts as they are very helpful. This is their reply:

      "As this is a magento server htaccess files will not work, you will need to create them in the nginx config."

      So it turns out I was half right in that the htaccess wasn't being used. I guess I'll go fiddle! Thanks for the help!

      Delete
    5. Makes sense, hope you manage to get the rewrites sorted out for Nginx!

      Delete
  14. Hi Hussey!
    Thanks very much for your guide!
    I am not a programmer and I can say your guide has been the best I've found so far.
    So, without any programming skills, I dared to use just the part about the .htaccess on my Magento 1.7.0.2 (production environment). It worked very well except for the cart part. In fact in the onepage checkout I notice it's missing the right column where there's the progress indicator.
    Besides that I am wondering whether I'll have any other problems if I don't apply the other changes you suggested. I'd rather prefer not to touch too much because I already have my magento not adding the trailing slash and I just wanted to prevent the use of it on the web by R=301 any possible " / " version to the "non- / " version of my urls.
    Please note, comparing other solutions on the matter with yours, and since your solution didn't work form me at first attempt, I also changed it a little in its final part (htaccess part) with the following:
    RewriteRule ^(.+)/$ http://www.mywebstite.com/$1 [R=301, L]

    is that correct or is it a possible source of problems? And what about the onepage checkout strange behaviour and possibly further (unforeseeble for me) problems?
    I hope my English is good enough! I hope you can help me, thanks very much in advance!

    ReplyDelete
    Replies
    1. If you just implement the .htaccess section of the above then the store will still generate URL's with trailing slashes but they will be removed by the rewrite when actually visiting a URL.

      Delete
  15. Thanks. It work great!!!

    ReplyDelete