SEO / Permalinks / Duplicate Content

Random stuff about serendipity. Discussion, Questions, Paraphernalia.
Post Reply
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

SEO / Permalinks / Duplicate Content

Post by johncanary »

Hi folks,

I just changed my permalink structure to be more SEO friendly and I'll explain
how I keep old URLs working at the same time AND make sure that Google
and other search engines are happy — even more happy than in the past.
You find a great tip from Google at the very end of this post.

However, let me start with the full story.

When I originally had set up my S9Y blog I had chosen the following
permalink structure for blog posts.

%year%-%month%/%id%-%title%

E.g.
http://blog.fcon21.biz/2009-01/230-Email-Marketing-Tips-Edition-16

As you can see I did a couple of things in an unusual way.
  • combined year and month of the post in one virtual directory
    instead of using the more common .../2009/01/... notation
  • use no file extension which is actually recommended by W3C.org and did
    not use a trailing slash "/" either.
  • I also wanted the post id, e.g. "230" as part of the filename because this
    makes the URL more robust against truncation and mistyping. Serendipity is
    actually very good at this and tolerates a lot of bogus while still serving the correct
    blog post.
  • Additionally I configured monthly archives by calling, e.g. ...biz/2009-01/
  • And of course I introduced short URLs in the form of, e.g. ...biz/p230/ with or
    even without trailing slash. Those come handy when used in emails.
What about the date in the URL?

The date in the URL is supposed to provide information about the publishing date of the
blog post. That's basically a good idea but I wanted it for the wrong reason.

I thoughts search engine users will enjoy the additional information in the URL and
that it helps them to decide to click on the link.
Well, they enjoy it too much and don't even click through to "older" posts. Web users are hunting for the latest, greatest information. Most of them there are a few exceptions.

However, the idea in SEO is to get people to your site. Depending on the topic
of the particular article the date when it was published could be relatively irrelevant
to the fact that it might provide the sought after solution to the reader.

There is a lot of "evergreen" content on the net, and on my blog as well.
But using the date in the URL simply shies potential web visitors away.
I want people to read it. Therefore, the date has to go.
The URL is 8 characters shorter all of a sudden. (A side benefit.)
Bye-bye.

Now what about the blog post id?

That's definitely a very good parameter to use in the URL because it's a "quick" database
index which is good for the performance. And it helps to protect the URL from typos and truncations.

The above sample URL still works as
http://blog.fcon21.biz/2009-01/230-Emablah-blah-blah

You'll notice that I don't redirect the URL (what you see in the browser navigation bar)
to the correct address. I only serve the blog post according to the blog post index 230.
It doesn't really matter for the user. The link works and it can be bookmarked. I'll talk
about the search engines in a bit.

What about SEO?

It's almost safe to assume that the most experienced SEO experts for blogging
can be found being active in the Wordpress world.

So let's learn from them. I'll keep it short.
  • The URL should contain keywords in the domain and the path.
  • Cryptic URL parameters, like index.php?author=A0076&post_id=234867&category=23&language=en-gb
    are not so good. Easy to see why, isn't it.
  • The shorter the better.
  • Omitting the date has more human than SEO reasons, but in general the structures
    should not go too deep either because it can limit the set of keywords your page
    can rank for. E.g.
    .../animals/mammals/small/rabbit/rabbit-flowers-garden

    The flower and garden part in the post has a hard time to compete against
    the animal part in the path respectively virtual directory structure. This might be
    a silly example, but it shows the point.
  • Testing and statistical analysis indicates that sites with 1 virtual directory before
    the post title rank better. There's a lot of debate about this subject and search engine
    algorithms are modified frequently. So in reality we are never too sure about it. I
    simply trust my sources (without disclosing them here.)

    That gives something like this:

    example.com/email-marketing/why-squeeze-page/

    With or without the trailing slash "/". Using the trailing slash gives the hint that
    the URL is complete and no letters have been lost. So it's good actually for
    human readability.

    Or you might have seen a lot of wordpress blog posts like this

    example.com/why-squeeze-page/

    This leaves out the category and is shorter as well. Bingo!
    Bingo?
  • Let's not forget a category "email-marketing" in the previous example
    could have influence the ranking in a positive or negative way. Therefore
    my sources suggest to use numbers, digits instead. They don't have a
    negative influence.

    Now I say, "Bingo"!

    That's the perfect place for the blog post id.
My new SEO improved permalink structure

%id%/%title%

Easy, effective, elegant, robust.

My original example
http://blog.fcon21.biz/2009-01/230-Email-Marketing-Tips-Edition-16

becomes
http://blog.fcon21.biz/230/Email-Marketing-Tips-Edition-16

and the associated short version
http://blog.fcon21.biz/230/

You'll notice all three URLs work, of course.
That's how it should be.

You want to avoid link rot and keep old URLs alive through redirection. I basically
achieved that goal with a modified .htaccess file and Apache Mod_Rewrite. I
simply have added ReWrite Rules from old versions of the URL to the new ones.
Done!

Attention: When you save certain changes in the Serendipity (S9Y) configuration
in the administration panel it will overwrite custom changes in the local .htaccess
file. That's why it is always a good idea to keep a backup with date stamp and
a README files for taking notes about particular modifications. I also had made a tiny
hack in 1 or 2 serendipity core files to make my life easier.


What about duplicate content?

Now I can access the same resource with many different URLs. That doesn't sound
too good for SEO purposes.

The most elegant way to deal with this issue is to do a "301 Permanent Redirect" to
the new permalink URL. In this case the displayed URL in the navigation bar of the Internet browser would change as well. And smart browsers could even update old bookmarks
automatically.

(I did not check on this but I assume that this required a change in the core files. Speaking
from my limited experience with version 1.1.2.

* A check if the requested URL is identical with the permalink, if not
do a 301 permanent redisrect. Eventually requiring an additional database call (I don't
know at which stage the permalink is read or constructed.



There is a much easier solution. Thanks to Google.

They allow you to specify the canonical form of the URL now which can be
achieved by a very easy change in the index.tpl template.

Simply add:

Code: Select all

{if $entry_id}
    ...
    <link rel="canonical" href="{$entry.link}" />
{/if}
Therefore I do not need to change the internal links between older posts.
Fabulous.

Here's the blog post from Google.

Webmaster Central Blog :: Specify Your Canonical


Yours
John W. Furst

Udate: Some additional observations about what can happen. (Call for precaution!)

In the P.S. of my blog post SEO Friendly Permalink Structure - Update
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: SEO / Permalinks / Duplicate Content

Post by johncanary »

Great!

Just saw the following Event Plugin.

HTML Link Metatags

Inserts link rel="start|up|prev|next|canonical" metatags into your frontend for better navigation.

Author: Garvin Hicking; version: 1.4; (Spartacus)

That's of course the ultimate solution.
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
onli
Regular
Posts: 2830
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: SEO / Permalinks / Duplicate Content

Post by onli »

Thanks for the detailled explanation.
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: SEO / Permalinks / Duplicate Content

Post by johncanary »

Okay, here is the final finishing touch. At least for now.

In addition to the above I changed all URLs to
  • all lowercase and
  • also removed, respectively prohibit the following characters
    . , ; ! %

    By default characters like # & ? = which have a special meaning in an URL are dropped already.
  • Even the German Umlauts ÄÖÜ are translated to the corresponding lower case ASCII version suitable for a non-internationalized URL ae oe ue.
    (ß gives ss by the way.)
Those additional changes are possible by changing the serendipity_config_local.inc configuration file. Details about those changes can be found in my post in the discussion:
How to remove dot and comma from entry-url

Here is my post: How I get my lower case URLs and remove , . ! etc.


Pay attention to the following also:
* Make sure the old URLs (which are out on the Web and in your blog posts) are still
recognized (.htaccess)
* Changing the Permalinks in Admin/Configuration will overwrite your .htaccess file
* ... also all links in the database table permalinks will be updated. (change it to force the
db-update, then change it back and re-apply your modifications in the .htaccess file.)
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
Anson
Regular
Posts: 24
Joined: Thu Apr 16, 2009 7:05 am

Re: SEO / Permalinks / Duplicate Content

Post by Anson »

johncanary wrote:Changing the Permalinks in Admin/Configuration will overwrite your .htaccess file
I haven't tested it - and I use lighttpd instead of Apache so I don't use .htaccess files - but are you sure that's really a problem? I assumed the "# BEGIN s9y" and "# END s9y" lines were markers used by s9y to determine what to replace each time you modify the rules, and that edits before BEGIN or after END would be preserved.
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: SEO / Permalinks / Duplicate Content

Post by johncanary »

Anson wrote:
johncanary wrote:"# BEGIN s9y" and "# END s9y" lines were markers used by s9y to determine what to replace each time
That might be true, but I am changing inside the lines S9Y updates. If I move or copy them to outside I'll get conflicting duplicates/variations(*) => not good.

(*) ... sort of
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: SEO / Permalinks / Duplicate Content

Post by johncanary »

Latest update here my S9Y friends.

Finally I have decided to do a
HTTP/1.0 301 Permanently Moved Redirect

on all short versions of the URL I have in use and
also accidentally truncated versions.

For simplicity and future compatibility I have
simply introduced a new, very brief index-301.php file
which queries the database for the full URL based on
the entry_id which I have in all entry URLs anyway.

And I also have introduced new short URL versions of my
blog posts in the root domain (without the blog.* subdomain)
I know I am crazy, but this comes very handy for Twitter for
example.

Check it out: http://fcon21.biz/150/

By the way: What drove me crazy was a slight syntax inconsistency
in the Apache mod_rewrite RewriteRules, which are not fully downward
compatible with older versions of Apache.

I have posted the simple and very easy fix, though it was hard to
find (at least for me).
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
Maccsta
Regular
Posts: 77
Joined: Mon Feb 19, 2007 6:07 am
Location: Leeds, England

Re: SEO / Permalinks / Duplicate Content

Post by Maccsta »

Hey Jon this is a great thread! I've followed your steps for truncating blog post urls for better SEO.

However I'm stuck on keeping old URLs alive through redirection.

What coding did you use to achieve the results on your blog?
Check out this blog today!
Buy Eye Secrets strips.
Best devices reviewed at http://www.penisstretchers.org/.
Order Capsiplex slimming pills today.
Buy Meratol diet pills online.
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: SEO / Permalinks / Duplicate Content

Post by johncanary »

Maccsta wrote:Hey Jon this is a great thread! I've followed your steps for truncating blog post urls for better SEO.

However I'm stuck on keeping old URLs alive through redirection.

What coding did you use to achieve the results on your blog?
It's mostly done by
** Configuring the permalink as "%id%/%title/"

** via .htaccess (that was tricky due to the bug in older apache versions.)

** and an index script that queries the database in order to do an external 301 permanently moved redirect with the full canonical URL.

** and using the s9y meta plugin for getting <link rel="canonical" href="... /> into the header.

** and since I originally used /(post|p)?%id%?/ as shortcut I had Garvin help me out with a little trick modifying one of the core files.

I don't have time to do debugging
with you, but I can email you my
modified files. I'm sure you can learn
from it.

Email me at info[ät]fcon21.biz
Subject: S9Y board - SEO mod_rewrite

By the way: You might want to check this out.

http://www.aweber.com/b/1ARoF

Yours
John

PS.: Now I need to fix the Smarty index.tpl and
entries.tpl files in my theme to correct for the worst
SEO crimes committed.
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
Maccsta
Regular
Posts: 77
Joined: Mon Feb 19, 2007 6:07 am
Location: Leeds, England

Re: SEO / Permalinks / Duplicate Content

Post by Maccsta »

Hey Jon thanks very much for the reply.

- I've configured the permalinks as "%id%/%title/"

- Used the URL Rewriting option in Serendipity enabled to 'Apache mod_rewrite'

- Installed the canonical plugin.

Posts that are indexed by google are currently the old urls.

Is it really a necessary to have them re-directed for SEO purposes if I've installed the canonical plugin? Do I really need to redirect from the old URLS to the new ones?
Check out this blog today!
Buy Eye Secrets strips.
Best devices reviewed at http://www.penisstretchers.org/.
Order Capsiplex slimming pills today.
Buy Meratol diet pills online.
Post Reply