
How to prevent search engines from indexing only the main page of the site
September 1, 2022
To prevent search engines from indexing only the main page, while allowing indexing of all other pages, you can use several approaches, depending on the characteristics of a particular site.
1. Using the robots.txt file
If the main page has its own address (usually it is index.php, index.html, index.htm, main.html and so on), and while trying to open a link like w-e-b.site/ a website redirects to the main page, for example, to w-e-b.site/index.htm, then you can use the robots.txt file with something like the following content:
User-agent: * Disallow: /index.php Disallow: /index.html Disallow: /index.htm Disallow: /main.html
In fact, using an explicit name for the main page is the exception rather than the rule. So let's look at other options.
You can use the following approach:
- Deny site-wide access with the “Disallow” directive.
- Then allow the indexing of the entire site using the “Allow” directive, except for the main page.
Sample robots.txt file:
User-agent: * Allow: ?p= Disallow: /
The “Allow” directive must always come before “Disallow”. The “Allow” directive allows all pages with a URL like “?p=”, and the “Disallow” directive disables all pages. As a result, the following result is obtained: indexing of the entire site (including the main page) is prohibited, except for pages with an address like “?p=”.
Let's look at the result of checking two URLs:
- https://suay.ru/ (main page) – indexing is prohibited
- https://suay.ru/?p=790#6 (article page) – indexing allowed
In the screenshot, number 1 marks the contents of the robots.txt file, number 2 is the URL being checked, and number 3 is the result of the check.
2. Using the robots meta tag
If your site is separate files, then add the robots meta tag to the HTML code of the main page file:
<meta name="robots" content="noindex,nofollow>
3. With .htaccess and mod_rewrite
Using .htaccess and mod_rewrite, you can block access to a specific file as follows:
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Google [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yandex [NC] RewriteRule (index.php)|(index.htm)|(index.html) - [F]
Please note that when you try to open a link like https://w-e-b.site/ (that is, without specifying the name of the main page), a specific file is still requested on the web server side, for example, index.php, index.htm or index. html. Therefore, this method of blocking access (and, accordingly, indexing) works even if the main page of your site opens without specifying a specific file name (index.php, index.html, index.htm, main.html, and so on), as is usually the case.
Related articles:
- Sitemap.xml files: what they are for, how to use them, and how to bypass “Too many URLs” error and size limits (85.3%)
- How to block by Referer, User Agent, URL, query string, IP and their combinations in mod_rewrite (67.4%)
- How to protect my website from bots (67.4%)
- How to block access to my site from a specific bux site or any other site with negative traffic (67.4%)
- How to view and send SMS from a computer (64.8%)
- How to change the menu language of the SAMSUNG monitor (on the example of the ViewFinity S6 LS34A650UBEXXT model) (RANDOM - 50%)