Avoiding duplicate content

Random stuff about serendipity. Discussion, Questions, Paraphernalia.
Post Reply
jhermanns
Site Admin
Posts: 378
Joined: Tue Apr 01, 2003 11:28 pm
Location: Berlin, Germany
Contact:

Avoiding duplicate content

Post by jhermanns »

I just added the following lines to my template in order for google to only index static pages and entries, but not the overview/archives/search pages. SEO people say that this will increase the likelyhood of all your distinct pages being indexed (so that the overview pages, that duplicate content, don't use up availible "slots" for your domain).

Code: Select all

{if $head_title}
    <meta name="robots" content="index,follow">
{else}
    <meta name="robots" content="noindex,follow">
{/if}
But this should really be either part of s9y's core or a plugin - solving it on template level is kinda stupid. Is there such a plugin? :)


P.S.
I did not use {if $is_single_entry} because that won't catch static pages.
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: Avoiding duplicate content

Post by johncanary »

There is nothing like "available slots"!
And there is no "Duplicate Content Penality", if you run your blog
  • * in a natural fashion
    * have original content
    * don't steal from other sites
    * don't overload it with advertisment
Only thing that happens is, Google hides result sets that it believes to be
redundant to a specific search. Then those pages are "hidden" behind the
"More Results" Link.

Those categories, monthly, archives, tag pages are just a different mix of the
content.
It can be those pages that get higher ranking for a particular search
term than a single entry page.
  • It is very simple:
    * The more pages you allow to be indexed
    * The more pages will be indexed
    * The more free traffic you will get
Google knows how blogs work and it knows what archives, .... are about.

Don't limit your potential.

You could use the ROBOTS.TXT file to achieve the same effect more easily.

JohnCanary
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
Don Chambers
Regular
Posts: 3652
Joined: Mon Feb 13, 2006 2:40 am
Location: Chicago, IL, USA
Contact:

Post by Don Chambers »

I look forward to any further input on this. I have an idea as to where this COULD be included in a plugin but will only do so if the concept has merit.
=Don=
jhermanns
Site Admin
Posts: 378
Joined: Tue Apr 01, 2003 11:28 pm
Location: Berlin, Germany
Contact:

Re: Avoiding duplicate content

Post by jhermanns »

johncanary wrote:There is nothing like "available slots"!
And there is no "Duplicate Content Penality", if you run your blog
  • * in a natural fashion
    * have original content
    * don't steal from other sites
    * don't overload it with advertisment
Only thing that happens is, Google hides result sets that it believes to be
redundant to a specific search. Then those pages are "hidden" behind the
"More Results" Link.
I don't think I mentioned penalties. Anyhow, the noindex on the overview page still makes sense to me - having one chunk of content only indexed once seems reasonable to me. So the above code snippet can be interpreted as making sure the right page is being indexed (and later offered as a search result). To ensure that search engines recognize the original source of information (as in "the entry", not some overview page). To ensure that archive/list pages don't push the individual entry pages out of the result list.
johncanary wrote: Those categories, monthly, archives, tag pages are just a different mix of the
content.
It can be those pages that get higher ranking for a particular search
term than a single entry page.
  • It is very simple:
    * The more pages you allow to be indexed
    * The more pages will be indexed
    * The more free traffic you will get
Google knows how blogs work and it knows what archives, .... are about.
Yeah it should - but I don't see any benefit in returning archive pages as search results - all content therein is also located on a single page (which may link to more interesting, related articles). Except when the search query overlaps two blog entries which are then returned together, but - I don't know.
johncanary wrote: Don't limit your potential.
You could use the ROBOTS.TXT file to achieve the same effect more easily.
JohnCanary
Not with less LOC though - especially when you think about static pages :-)
Don Chambers
Regular
Posts: 3652
Joined: Mon Feb 13, 2006 2:40 am
Location: Chicago, IL, USA
Contact:

Post by Don Chambers »

Don Chambers wrote:I look forward to any further input on this. I have an idea as to where this COULD be included in a plugin but will only do so if the concept has merit.
Scratch that. After reviewing the intent and code of the plugin I had in mind, I really do not think it is the place to incorporate this concept since it applies to so many different types of pages (entries, overviews, archives, static pages, and other plugin generated non-entry pages).
=Don=
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: Avoiding duplicate content

Post by johncanary »

jhermanns wrote:I don't think I mentioned penalties.
You are right, you didn't.

Having an overview page indexed gives you simply more chances that your blog
turns up in some search results (SERP) for some users. That's what I know.

A search engine cares most about giving best matching results to the user. They
do not care so much about original content, especially, if it's on the same
site. Search engines cluster words and phrases, do statistical analysis, ...

What could be a benefit of having more pages (even overviews, ...) indexed?

It simply increases the probablility that a search engine user finds to your
blog. Instead of setting some pages to 'noindex',
I would focus on
  • * Providing a sitemap, which is very effective
    * getting as many inbound links from many different, relevant sources as
    possible. With this it makes much sense to contentrate on the "entries".
I vote for having these three functions in the S9Y core:
  • * Announcement (Ping popular services)
    * Trackback Control
    * Full pingback support (currently only in the development version)
That's the best for publicity. Publicity brings inbound links. Inbound links
are the most effective SEO technique.
jhermanns wrote:
johncanary wrote:You could use the ROBOTS.TXT file to achieve the same effect more easily.
JohnCanary
Not with less LOC though - especially when you think about static pages :-)
LOC ?

Yours
Johncanary
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
jhermanns
Site Admin
Posts: 378
Joined: Tue Apr 01, 2003 11:28 pm
Location: Berlin, Germany
Contact:

Re: Avoiding duplicate content

Post by jhermanns »

johncanary wrote: Having an overview page indexed gives you simply more chances that your blog
turns up in some search results (SERP) for some users. That's what I know.
Or the overview page could "dominate" for some reason and push the article page out of the search results - and the article page contains links to related blog entries. That's what I tried to say :-)
johncanary wrote:nstead of setting some pages to 'noindex',
I would focus on
  • * Providing a sitemap, which is very effective
    * getting as many inbound links from many different, relevant sources as
    possible. With this it makes much sense to contentrate on the "entries".
I vote for having these three functions in the S9Y core:
  • * Announcement (Ping popular services)
    * Trackback Control
    * Full pingback support (currently only in the development version)
That's the best for publicity. Publicity brings inbound links. Inbound links
are the most effective SEO technique.
How would an accessible Site that is not very huge (as adobe.com) profit from submitting sitemaps? If you search for adobe (on google) e.g. you see the effect of submitting a sitemap for a large site. But for regular sites I don't really see the use...
johncanary wrote:
jhermanns wrote:
johncanary wrote:You could use the ROBOTS.TXT file to achieve the same effect more easily.
JohnCanary
Not with less LOC though - especially when you think about static pages :-)
LOC ?
Lines of Code :-)
johncanary
Regular
Posts: 116
Joined: Mon Aug 20, 2007 4:00 am
Location: Spain
Contact:

Re: Avoiding duplicate content

Post by johncanary »

jhermanns wrote:How would an accessible Site that is not very huge (as adobe.com) profit from
submitting sitemaps? If you search for adobe (on google) e.g. you see the effect of
submitting a sitemap for a large site. But for regular sites I don't really see the
use...
Sitemaps reviewed:

PRO:
  • * Use XML sitemaps to inform Google/Yahoo/MSN about updates on your site. Sitemaps are more
    often crawled than your entire site.

    * When maintained by a Content Management System like Serendipity it is updated
    automatically without any additional effort and search engines are pinged actively
    upon the page update.

    * It gets you more pages of your site into the index more quickly.
CONTRA:
  • * Sitemaps are used by copyscrapers to more easily find content to steal.
    * You need to set it up, but that doesn't really count.
That's my experience.

You are right that it might be too much hassle for smaller sites, or sites that don't
change very often. I also use plain text sitemap files for simplicity in such
cases. But even those help to get pages indexed more easily.

I don't announce the sitemaps in the ROBOTS.TXT file, and I don't use the
default filenames to make it at least a bit more difficult for copyscrapers.

P.S.: If you want to find Adobe, search for "Click here" (at least on Google.com).
:shock:
Yours John
: John's Google+ Profile
: John's E-Biz Booster Blog powered by Serendipity 1.7/PHP 5.3.14
Don Chambers
Regular
Posts: 3652
Joined: Mon Feb 13, 2006 2:40 am
Location: Chicago, IL, USA
Contact:

Post by Don Chambers »

Jannis - how has this modification impacted your search results since you implemented the change?
=Don=
jhermanns
Site Admin
Posts: 378
Joined: Tue Apr 01, 2003 11:28 pm
Location: Berlin, Germany
Contact:

Post by jhermanns »

i have not yet checked the impacts, but I have updated the code:

Code: Select all

    {if ($view == "entry" || $view == "plugin") && $smarty.server.REQUEST_URI|truncate:17:"" != "/daily/plugin/tag"}
    <meta name="robots" content="index,follow" />
    {else}
    <meta name="robots" content="noindex,follow" />
    {/if}
:-)
Don Chambers
Regular
Posts: 3652
Joined: Mon Feb 13, 2006 2:40 am
Location: Chicago, IL, USA
Contact:

Post by Don Chambers »

Keep us posted if you experience any noticeable change in search results. I'm really curious.
=Don=
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

Hey Jannis,

Got a quick question, why are you truncating 17 characters. Looking at your domain 14 seems like a better number (ie http://jann.is), care to explain more?

Carl
jhermanns
Site Admin
Posts: 378
Joined: Tue Apr 01, 2003 11:28 pm
Location: Berlin, Germany
Contact:

Post by jhermanns »

hey carl,

the $smarty.server.REQUEST_URI variable does not contain the protocol and hostname, what I am truncating is a string like /daily/plugin/tag/sometag.

And because I only want the single entries and the static pages to be indexed, I had to add this check. Because the $view=="plugin" check is true not only for pages generated by the static page plugin, but also for pages created by the freetag plugin.

So I truncate the REQUEST_URI to the first 17 characters: the length of /daily/plugin/tag. That adds "noindex" to any url that belongs to the freetag plugin.
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

Awesome, thanks for the info :D
Michele2
Regular
Posts: 39
Joined: Mon Aug 06, 2007 11:19 pm

Post by Michele2 »

This is exactly what I was looking for - a way to not have Google index search results pages. On my blog, which is about crafts and marketing them, I see no reason for Google to index search pages for the words "wisely", "remiss", "teensy", "sacrificed" and other equally useless terms. Seems ridiculous at best, spammy at worst.

Right now I think I have more search pages indexed than entries. :cry:

One major drawback that I see is that the code also marks the homepage as noindex, nofollow. I changed the code a bit to have the homepage index, follow...

Code: Select all

{if $head_title}
    <meta name="robots" content="index,follow">
{elseif $startpage}
    <meta name="robots" content="index,follow">
{else}
    <meta name="robots" content="noindex,follow">
{/if}
Post Reply