Sitemap.xml files: what they are for, how to use them, and how to bypass “Too many URLs” error and size limits
October 20, 2022
Table of contents
- What are Sitemaps
- What are the restrictions for sitemap files
- How can you compress a sitemap file
- Can I use multiple sitemaps?
- What is the structure of sitemap files
- How to generate sitemap files
- How to Import a Sitemap into Google Search Console
- Sitemap.xml file status “Couldn't fetch”
- Is it necessary to use the sitemap.xml file?
- What to do if the sitemap contains an error. How to remove a sitemap file from Google Search Console
Sitemaps are XML-formatted files that contain a list of the URLs of your site's pages for submission to the Google search engine so that it can quickly find out and index them.
- The file size should not be more than 50 MB
- There can be no more than 50,000 links in one file
In addition to the simple text format with XML markup, the file can be compressed into a .gz archive. In this case, the file size decreases dramatically because text files compress very well. For example, my 25 MB file was compressed into a 500 KB file.
To do this, it is enough to compress the original sitemap.xml file into .gz format. As a link in Google Search Console, you need to specify the path to the archive, for example: https://site.net/sitemap.xml.gz
If, when you try to open the https://site.net/sitemap.xml.gz file in a web browser, it downloads it to your computer instead of showing the content as for the sitemap.xml file, then this is normal. Either way, Google Search Console will be able to process this file.
For each site or domain resource, you can create multiple Sitemaps and import them all into Google Search Console – this is not only allowed, but also recommended by Google itself for sitemaps that are too large.
If there are many Sitemap files, then a complete list of them can be collected in a separate Sitemap file. This file is called “Sitemap Index File”. An example of the content of the sitemap.xml file:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://site.net/sitemaps/sitemap_1.xml</loc> </sitemap> <sitemap> <loc>https://site.net/sitemaps/sitemap_2.xml</loc> </sitemap> <sitemap> <loc>https://site.net/sitemaps/sitemap_3.xml</loc> </sitemap> </sitemapindex>
After that, it is enough to import this main file into Google Search Console.
The rest of the sitemaps listed in the main index file will automatically be imported into the Google Search Console.
To see them, click on the file name. You will see a list of imported Sitemaps.
You need to wait before these files are processed and their status changes to “Success”.
Sitemap files have the following structure:
<?xml version="1.0" encoding="utf-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://domain.site.net/?p=1</loc> <lastmod>2022-10-08T14:14:27+00:00</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>https://domain.site.net/?p=2</loc> <lastmod>2022-10-08T14:14:27+00:00</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>https://domain.site.net/?p=3</loc> <lastmod>2022-10-08T14:14:27+00:00</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
Each entry consists of four elements:
- Date of last modification
- Frequency of modification (e.g. monthly)
- A priority
If you are using WordPress, then the easiest way is to install a sitemap plugin.
If there is no sitemap plugin for your site engine, then it is quite easy to generate it yourself, since it is just a text file with XML markup.
Go to Google Search Console, select the site you want to report the Sitemap for, enter the URL of the Sitemap.
At first, an inscription may appear that the sitemap.xml file “Couldn't fetch”. This inscription appears even if everything is alright with the sitemap.xml file. You just need to wait a little.
The bottom line is that this inscription does not mean that there are problems with the sitemap.xml file. It's just that the turn to analyze this file has not yet come.
A little later, the status of the file will change to “Successful”. At the same time, it will show how many URLs were revealed thanks to this file.
Even later, you can view the link indexing report from the sitemap.xml file.
In fact, I don't usually use a sitemap.xml file. I add articles to most sites manually and, in my opinion, the sitemap.xml file is not particularly needed, since pages on such sites are indexed very quickly.
But if you're unhappy with your site's indexing speed, or need to quickly report a large number of URLs to be indexed, then try using sitemap.xml files.
What to do if the sitemap contains an error. How to remove a sitemap file from Google Search Console
If, after trying to process the Sitemap, you find that it contains errors (for example, an incorrect date format or broken links), then you do not have to wait until the time comes for the next crawling.
You can delete a Sitemap from Google Search Console and add it again right away. After that, quite quickly (within a few minutes), Google will check the Sitemap file again.
To remove a Sitemap file from Google Search, click on it. On the page that opens, in the upper right corner, find the button with three horizontal dots. Click it and select “Remove sitemap”.
After that, the Sitemap file will be deleted and you, after correcting errors in it, can immediately re-add the Sitemap file with the same or a different URL.
- How to prevent search engines from indexing only the main page of the site (94%)
- How to view and send SMS from a computer (68.1%)
- How to change the country in the Play Store (68.1%)
- WordPress: A critical error occurred on the site – impossible to enter the control panel (SOLVED) (59.2%)
- Multi-button mouse for increased productivity (59.2%)
- Connection type “Bridged Adapter” stopped working on VirtualBox guest machine (SOLVED) (RANDOM - 50%)