Articles are always not unique, and
sometimes although original, the
owner chooses to show it just to
the end visitors and not the search
engines for various reasons. The
search engine crawlers perform
different functions when it visits a
web page :
Indexing the content of the web
page (capturing similar to
screenshots)
Following the links put in the
web page
Finally putting the web page in
appropriate place in the search
results page.
Most of the bloggers and
webmasters always make a mistake
by assuming robots.txt as the only
solution to index prevention.
Robots.txt is a text that would
prevent the entire crawling of the
search engine robots in the web
pages, but it doesn’t prevent the
web page URLs from being shown
in the search results.
The settings for robots and indexing
varies for different search engines,
and for Google there is the Google
webmaster tools from where
you can modify the Google Robots
settings, and there you find options
on whether you want any specific
pages to be prevented from being
crawled, or if you want to remove
the entire website from the Search
engine index.
By using specific tags, you can tell
the search engines whether to just
index the page, and not show the
same in the search results. Here is a
code that does both the functions
together -
The above code prevents the page
from being crawled, and even from
being shown in the search results.
The robots tag is to prevent the
crawling of the content, the noindex
is the one you actually need as it
prevents the indexing and displaying
of web pages in search results, and
nofollow tag prevents the following
of links you placed in the web page.
The problem with above code is
that you need to place it individually
unless it is a platform like
wordpress, so Google came up
with a header tag for multiple pages,
which can be placed in the top of
the .htaccess file -
Header set X-Robots-Tag "noindex,
nofollow"
For Yahoo search engine, there is
a little change -
You need to apply the Noarchive
metatag, so place the following code
in the section of your website -
But similar to what we saw with
Google, you need to place the
NOINDEX code here too, to
prevent the page from being
indexed.
Below is a table explaining how the
search engine crawlers are
influenced by using different tags -
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment