Tag: webmaster notes

How to reboot a server in DigitalOcean

DigitalOcean offers various cloud technologies for developers, but of all the variety of services, I use only VPS (virtual private server).

At the same time, due to the large number of cloud solutions, it is not intuitively clear how to perform such a simple action in DigitalOcean as rebooting the VPS.

The server can be restarted by connecting to it via SSH and executing the command

reboot

But if the server is frozen and SSH is not working, then you can force a reboot of the VPS from the DigitalOcean control panel.

How to restart a VPS in DigitalOcean

Go to the control panel and select Droplets from the menu, then select the server you want to restart.

Then select the “Power” tab.

On it you will see two possible actions:

  • Turn off Droplet – turn off the VPS
  • Power cycle – VPS reboot

That is, to restart the VPS, you need to press the “Power Cycle” button.

After that, wait until the reboot is completed.

What is the difference between Turn off Droplet and Power cycle

Turn off Droplet roughly corresponds to a power outage from your virtual private server. That is, the server stops working. At the same time, the server itself is saved, its IP and IPv6 addresses and other characteristics are saved.

You can turn your server back on at any time.

The “Turn off Droplet” feature is necessary if you want to make it unavailable, shut down your server for any reason.

Please note that turning off the server using the “Turn off Droplet” does not cancel the payment for it! You will still be charged for the plan for VPS in accordance with the selected server parameters!

If you want the server to no longer be charged, then you need to destroy it (that is, go to the “Destroy” section and perform the appropriate actions to delete the server). After deleting the server, it will be impossible to restore it!

The “Power cycle” is to turn off the power from the server, and then turn it on again, that is, this corresponds to a VPS reboot.

That is, you can restart the server by turning off the power in the Turn off Droplet, and then turning it back on manually, or by performing a Power cycle. However, these actions are not quite equivalent:

  • Turning off and on using the “Turn off Droplet” will mean an attempt to send a command to the server to turn off. If this fails, the power will be forcibly turned off. You can then restart the server manually. All of this takes longer overall, but is a bit safer.
  • Using “Power cycle” will turn the power off and back on without attempting to shut down the server safely. Power cycle takes less time (provided that your VPS is healthy and able to boot on its own).

DigitalOcean promo code

If you want to get a DigitalOcean promo code for testing VPS (or other cloud features) for free, then use this link.

You will be given $200, which you can use to create a VPS, among other things.

Sitemap.xml files: what they are for, how to use them, and how to bypass “Too many URLs” error and size limits

Table of contents

  1. What are Sitemaps
  2. What are the restrictions for sitemap files
  3. How can you compress a sitemap file
  4. Can I use multiple sitemaps?
  5. What is the structure of sitemap files
  6. How to generate sitemap files
  7. How to Import a Sitemap into Google Search Console
  8. Sitemap.xml file status “Couldn't fetch”
  9. Is it necessary to use the sitemap.xml file?
  10. What to do if the sitemap contains an error. How to remove a sitemap file from Google Search Console

What are Sitemaps

Sitemaps are XML-formatted files that contain a list of the URLs of your site's pages for submission to the Google search engine so that it can quickly find out and index them.

What are the restrictions for sitemap files

  1. The file size should not be more than 50 MB
  2. There can be no more than 50,000 links in one file

How can you compress a sitemap file

In addition to the simple text format with XML markup, the file can be compressed into a .gz archive. In this case, the file size decreases dramatically because text files compress very well. For example, my 25 MB file was compressed into a 500 KB file.

To do this, it is enough to compress the original sitemap.xml file into .gz format. As a link in Google Search Console, you need to specify the path to the archive, for example: https://site.net/sitemap.xml.gz

If, when you try to open the https://site.net/sitemap.xml.gz file in a web browser, it downloads it to your computer instead of showing the content as for the sitemap.xml file, then this is normal. Either way, Google Search Console will be able to process this file.

Can I use multiple sitemaps?

For each site or domain resource, you can create multiple Sitemaps and import them all into Google Search Console – this is not only allowed, but also recommended by Google itself for sitemaps that are too large.

If there are many Sitemap files, then a complete list of them can be collected in a separate Sitemap file. This file is called “Sitemap Index File”. An example of the content of the sitemap.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<sitemap>
		<loc>https://site.net/sitemaps/sitemap_1.xml</loc>
	</sitemap>
	<sitemap>
		<loc>https://site.net/sitemaps/sitemap_2.xml</loc>
	</sitemap>
	<sitemap>
		<loc>https://site.net/sitemaps/sitemap_3.xml</loc>
	</sitemap>
</sitemapindex>

After that, it is enough to import this main file into Google Search Console.

The rest of the sitemaps listed in the main index file will automatically be imported into the Google Search Console.

To see them, click on the file name. You will see a list of imported Sitemaps.

You need to wait before these files are processed and their status changes to “Success”.

What is the structure of sitemap files

Sitemap files have the following structure:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<url>
		<loc>https://domain.site.net/?p=1</loc>
		<lastmod>2022-10-08T14:14:27+00:00</lastmod>
		<changefreq>monthly</changefreq>
		<priority>0.8</priority>
	</url>
<url>
	<loc>https://domain.site.net/?p=2</loc>
		<lastmod>2022-10-08T14:14:27+00:00</lastmod>
		<changefreq>monthly</changefreq>
		<priority>0.8</priority>
	</url>
	<url>
		<loc>https://domain.site.net/?p=3</loc>
		<lastmod>2022-10-08T14:14:27+00:00</lastmod>
		<changefreq>monthly</changefreq>
		<priority>0.8</priority>
	</url>
</urlset>

Each entry consists of four elements:

  1. URL
  2. Date of last modification
  3. Frequency of modification (e.g. monthly)
  4. A priority

How to generate sitemap files

If you are using WordPress, then the easiest way is to install a sitemap plugin.

If there is no sitemap plugin for your site engine, then it is quite easy to generate it yourself, since it is just a text file with XML markup.

How to Import a Sitemap into Google Search Console

Go to Google Search Console, select the site you want to report the Sitemap for, enter the URL of the Sitemap.

Sitemap.xml file status “Couldn't fetch”

At first, an inscription may appear that the sitemap.xml file “Couldn't fetch”. This inscription appears even if everything is alright with the sitemap.xml file. You just need to wait a little.

The bottom line is that this inscription does not mean that there are problems with the sitemap.xml file. It's just that the turn to analyze this file has not yet come.

A little later, the status of the file will change to “Successful”. At the same time, it will show how many URLs were revealed thanks to this file.

Even later, you can view the link indexing report from the sitemap.xml file.

Is it necessary to use the sitemap.xml file?

In fact, I don't usually use a sitemap.xml file. I add articles to most sites manually and, in my opinion, the sitemap.xml file is not particularly needed, since pages on such sites are indexed very quickly.

But if you're unhappy with your site's indexing speed, or need to quickly report a large number of URLs to be indexed, then try using sitemap.xml files.

What to do if the sitemap contains an error. How to remove a sitemap file from Google Search Console

If, after trying to process the Sitemap, you find that it contains errors (for example, an incorrect date format or broken links), then you do not have to wait until the time comes for the next crawling.

You can delete a Sitemap from Google Search Console and add it again right away. After that, quite quickly (within a few minutes), Google will check the Sitemap file again.

To remove a Sitemap file from Google Search, click on it. On the page that opens, in the upper right corner, find the button with three horizontal dots. Click it and select “Remove sitemap”.

After that, the Sitemap file will be deleted and you, after correcting errors in it, can immediately re-add the Sitemap file with the same or a different URL.

How to prevent Tor users from viewing or commenting on a WordPress site

The Tor network is an important tool for anonymity, privacy, and censorship circumvention, which in some countries is being fought even at the state level.

But Tor is a public tool, so it can sometimes be used for online trolling and bullying. This article will show you how:

  • prevent Tor users from commenting on your WordPress site
  • prevent Tor users from registering and logging into the site
  • prevent Tor users from viewing WordPress site

WordPress plugin to control allowed actions from the Tor network

VigilanTor is a free WordPress plugin that can block comments, browsing, and registration for Tor users.

This plugin automatically updates the list of IP addresses of the Tor network and, after configuration, automatically controls and blocks Tor users.

To install VigilanTor, go to WordPress Admin Panel → Plugins → Add New.

Search for “VigilanTor”, install and activate it.

Then go to Settings →VigilanTor Settings.

We will perform all subsequent actions on the plugin settings page.

How to disable commenting on a site from Tor

Enable two settings:

  • Block Tor users from commenting (prevent Tor users from commenting your WordPress site)
  • Hide comment form from Tor users

Now Tor users will still be able to view your site, but when they try to leave a comment, they will receive a message:

Error: You appear to be commenting from a Tor IP address which is not allowed.

How to prevent Tor users from registering and logging into the site

To prevent Tor users from registering on a WordPress site and preventing registered users from logging in from the Tor network, enable the following settings:

  • Block Tor users from registering
  • Flag users who signed up using Tor
  • Block Tor users from logging in (Useful for preventing brute for attacks)

How to Block Tor Users from Viewing a WordPress Site

Enable setting:

  • Block Tor users from all of WordPress

This setting will prevent any activity, including logging into the site, commenting, and browsing.

When trying to open a site in Tor, the user will receive a message:

Sorry, you cannot access this website using Tor.

How often does VigilanTor update the list of Tor IP addresses

The Tor network often changes IP addresses, that is, new ones are added, and old ones are removed. Once downloaded, the Tor network IP list becomes obsolete over time.

VigilanTor automatically downloads the list of Tor IP addresses and updates it automatically.

By default, the update is performed every 10 minutes. You can increase this interval to 6 hours, or enable real-time updates.

How to prevent search engines from indexing only the main page of the site

To prevent search engines from indexing only the main page, while allowing indexing of all other pages, you can use several approaches, depending on the characteristics of a particular site.

1. Using the robots.txt file

If the main page has its own address (usually it is index.php, index.html, index.htm, main.html and so on), and while trying to open a link like w-e-b.site/ a website redirects to the main page, for example, to w-e-b.site/index.htm, then you can use the robots.txt file with something like the following content:

User-agent: *
Disallow: /index.php
Disallow: /index.html
Disallow: /index.htm
Disallow: /main.html

In fact, using an explicit name for the main page is the exception rather than the rule. So let's look at other options.

You can use the following approach:

  1. Deny site-wide access with the “Disallow” directive.
  2. Then allow the indexing of the entire site using the “Allow” directive, except for the main page.

Sample robots.txt file:

User-agent: *
Allow: ?p=
Disallow: /

The “Allow” directive must always come before “Disallow”. The “Allow” directive allows all pages with a URL like “?p=”, and the “Disallow” directive disables all pages. As a result, the following result is obtained: indexing of the entire site (including the main page) is prohibited, except for pages with an address like “?p=”.

Let's look at the result of checking two URLs:

  • https://suay.ru/ (main page) – indexing is prohibited
  • https://suay.ru/?p=790#6 (article page) – indexing allowed

In the screenshot, number 1 marks the contents of the robots.txt file, number 2 is the URL being checked, and number 3 is the result of the check.

2. Using the robots meta tag

If your site is separate files, then add the robots meta tag to the HTML code of the main page file:

<meta name="robots" content="noindex,nofollow>

3. With .htaccess and mod_rewrite

Using .htaccess and mod_rewrite, you can block access to a specific file as follows:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Google [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC]
RewriteRule (index.php)|(index.htm)|(index.html) - [F]

Please note that when you try to open a link like https://w-e-b.site/ (that is, without specifying the name of the main page), a specific file is still requested on the web server side, for example, index.php, index.htm or index. html. Therefore, this method of blocking access (and, accordingly, indexing) works even if the main page of your site opens without specifying a specific file name (index.php, index.html, index.htm, main.html, and so on), as is usually the case.

iThemes Security locked out a user – how to login to WordPress admin when user is banned (SOLVED)

iThemes Security is a plugin for WordPress that makes it difficult for hackers to attack the site and collect information.

Among other features, iThemes Security has protection against brute-form paths (search for “hidden” folders and files), as well as protection against hacking user credentials by brute force passwords.

Once set up, the iThemes Security plugin usually works fine and doesn't require much attention. But sometimes there may be a problem with blocking your user, because someone tried to guess the password to your account.

The situation may arise in the following scenario:

1. You have activated the function of protecting accounts from brute-force passwords

2. The attacker repeatedly tried to guess the password from your account

3. As a result, the account was blocked

4. When you try to enter your username and password from your account to get into the WordPress administration panel, you get a message that it is blocked (banned):

YOU HAVE BEEN LOCKED OUT.
You have been locked out

You don't have to wait until the account is unlocked.

If you have access to the file system, then you can immediately log into the WordPress admin panel.

I don't know how to bypass the iThemes Security lock, instead the plan of action is the following:

1. Disable iThemes Security

2. Login to the WordPress admin area

3. Enable iThemes Security

To disable any WordPress plugin, simply remove the plugin folder. And it is not necessary to delete it – just rename it.

Open the file manager of your sites and find the following path there: SITE/wp-content/plugins/

If you are using the command line, then the path to the plugin is: SITE/wp-content/plugins/better-wp-security

Find the better-wp-security folder and rename it to something like “-better-wp-security”.

Right after that, you can log into the WordPress admin panel.

Once you are logged into the WordPress admin panel, you can reactivate the iThemes Security plugin. To do this, rename the “-better-wp-security” folder to “better-wp-security”.

All is ready! No additional iThemes Security configuration is required.

Checking the logs showed that the attack (brute-force user credentials) was carried out through the xmlrpc.php file.

The xmlrpc.php file provides features that most webmasters don't use but are actively exploited by hackers. For this reason, you can safely block access to the xmlrpc.php file. If you do not know what this file is for, then most likely you do not use it, and you can block access to it without consequences for you.

You can disable XML-RPC with an .htaccess file or a plugin.

.htaccess is a configuration file that you can create and modify.

Just paste the following code into your .htaccess file at the root of your WordPress site (the solution uses mod_rewrite):

# Block requests for WordPress xmlrpc.php file
RewriteRule ^xmlrpc\.php - [NC,F]

Your server must support .htaccess and mod_rewrite files – most hosts can do this.

WordPress error “Another update is currently in progress” (SOLVED)

When updating a WordPress site, for example, when migrating to a new version of WordPress, you may encounter an error:

Another update is currently in progress.

This problem is fairly easy to fix. It is especially pleasing that this error is not fatal, unlike, this error does not prevent users from browsing the site, and the webmaster can go to the WordPress admin area to solve the problem.

Why does the error “Another update is currently in progress” occurs?

You may see this message if the site has multiple administrators and you are trying to update WordPress at the same time. In this case, wait until another webmaster completes his job.

If you are the only administrator of the site, then the cause of this error may be a failed previous update, which was interrupted, for example, due to a broken connection.

How to fix “Another update is currently in progress” with a plugin

Due to the fact that it is possible to go to the WordPress admin panel, this error can be solved using a plugin.

The plugin is called “Fix Another Update In Progress” and can be installed through the WordPress Admin Panel.

To do this, in the admin panel, go to “Plugins” → “Add New”.

Search for “Fix Another Update In Progress”, install and activate this plugin.

Then go to “Settings” → “Fix Another Update In Progress” and click the “Fix WordPress Update Lock” button.

After that, the problem should be fixed.

How to fix “Another update is currently in progress” in phpMyAdmin

If you don't want to install the plugin, then this error can be fixed by deleting one value from the database of the WordPress site. For ease of editing the database, you can use phpMyAdmin.

Start by finding the database of the site you want to fix.

Open a table named “wp_options”.

Find the line named “core_updater.lock”. To speed up the search, you can use the search in database. Search by the “option_name” column.

Click the “Delete” button.

After that, the problem will be solved.

How to block access to my site from a specific bux site or any other site with negative traffic

There are situations when negative traffic comes from certain sites, for example, from bux sites or simply from sites that you don't like. In some cases, such traffic can be dealt with, but not always.

Quite often, there are tasks like “go to a search engine, enter such and such a query, go to such and such a site” in bux sites – this is unlikely to be combated, since this request is difficult to distinguish from ordinary traffic.

But if the request is made directly from the bux site, or is shown in an iframe, then this can be dealt with.

Also, if your site has been added to an aggregator or a link is placed on a site that you do not like, then this method will also work.

For example, a bad site is https://site.click/. To block traffic from this site, you can use the following:

RewriteCond %{HTTP_REFERER} https://site.click/ [NC]
RewriteRule .* - [R=404]

These lines need to be written to the .htaccess file. These are the rules for the mod_rewrite module, which is usually enabled in Apache.

In this case, everyone who came from the site https://site.click/ will be shown the message “404 page not found”. If desired, you can put any other response code instead of 404, for example, 403 (access denied), 500 (internal server error) or any other.

If you want to block access from multiple sites, use the [OR] flag, for example:

RewriteCond %{HTTP_REFERER} https://site.click/ [NC,OR]
RewriteCond %{HTTP_REFERER} anotherdomain\.com [NC,OR]
RewriteCond %{HTTP_REFERER} andanotherdomain\.com [NC,OR]
RewriteCond %{HTTP_REFERER} onemoredomain\.com [NC]
RewriteRule .* - [R=404]

Note that the [OR] flag does not need to be specified on the last line.

Instead of displaying an error, you can redirect to any page of your site, for example, in the following case, all users who come from the site https://site.click/ will be sent to the error.html page of your site:

RewriteCond %{HTTP_REFERER} https://site.click/ [NC]
RewriteRule .* error.html [R]

And the following rules set everyone who came from the site https://site.click/ to redirect to https://natribu.org/ru/:

RewriteCond %{HTTP_REFERER} https://site.click/ [NC]
RewriteRule .* https://natribu.org/ru/ [R]

How to protect my website from bots

In the article “How to block by Referer, User Agent, URL, query string, IP and their combinations in mod_rewrite” I showed how to block requests to a site that match several parameters at once – on the one hand, it is effective against bots, on the other – practically eliminates false positives, that is, when a regular user who is not related to bots.

It is not difficult to block bots, it is difficult to find their patterns that expose the request from the bot. There should have been another part in that article, in which I showed exactly how I assembled these patterns. I wrote it, took screenshots, but ultimately didn't add it to the article. Not because I am greedy, but I just thought that this was at odds with the topic of an article that was not the easiest one, and, in fact, very few people are interested in it.

But the day before yesterday, bots started an attack on my other site, I decided to take action against the bot… I forgot how I was collecting data)))) In general, so as not to invent commands every time, now they will be stored here)) You might find this useful too.

How to know that a site has become a target for bots

The first sign is a sharp and unreasonable increase in traffic. This was the reason to go to Yandex.Metrica statistics and check “Reports” → “Standard reports” → “Sources” → “Sources, summary”:

Yes, there is a sharp surge in direct visits, and today there are even more of them than traffic from search engines.

Let's look at Webivisor:

Short sessions from mobile devices, strange User Agents (includes very old devices), specific nature of the region/ISP. Yes, these are bots.

Identifying of IP addresses of the bots

Let's look at the command:

cat site.ru/logs/access_log | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | awk '{ print $1 }' | sort | uniq -c

In it:

  • cat site.ru/logs/access_log — read the web server log file
  • grep '"-"' — we filter requests, leaving only with an empty referrer
  • grep -E -i 'android|iPhone' — filter requests, leaving only mobile ones
  • grep -i -E -v 'google|yandex|petalbot' — remove requests from specified web crawlers
  • awk '{ print $1 }' — leave only the IP address (first field)
  • sort | uniq -c — sort and leave unique ones, display the quantity

In my opinion, everything is pretty obvious, all requests come from the same subnet 185.176.24.0/24.

But now is morning, there is still little data, let's check the log of the web server for yesterday:

zcat site.ru/logs/access_log.1 | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | awk '{ print $1 }' | sort | uniq -c

Yes, all bots came from the 185.176.24.0/24 network.

Basically, you can just block this entire subnet and end up there. But it is better to continue collecting data, then I will explain why.

Let's see which pages the bots are requesting:

cat site.ru/logs/access_log | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | grep '185.176.24' | awk '{ print $7 }' | sort | uniq -c

zcat site.ru/logs/access_log.1 | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | grep '185.176.24' | awk '{ print $7 }' | sort | uniq -c

These commands have new parts:

  • grep '185.176.24' — filter for requests from the attacker's network
  • awk '{ print $7 }' — the requested page in my server logs is the seventh column

The bot requests exactly 30 pages.

We return to the article “How to block by Referer, User Agent, URL, query string, IP and their combinations in mod_rewrite” and block the bot.

But in my case, I can get by with blocking the subnet.

In Apache 2.4:

<RequireAll>
	Require all granted
	Require not ip 185.176.24
</RequireAll>

In Apache 2.2:

Deny from 185.176.24

Keep your finger on the pulse

This is not the first influx of bots that I've been fighting, and you need to remember that the owner of bots changes the bot settings after your actions. For example, the previous time it all started with the following pattern:

  • bots requested 5 specific pages
  • all bots were with Android user agent
  • came from a specific set of mobile operator networks
  • empty referrer

After I blocked on these grounds, the owner of bots changed the behavior of the bots:

  • added URL (now 8 pages)
  • iPhone added as User-Agent
  • the number of subnets increased, but bots still came only from mobile operators

I blocked them too. After that, the bot engine added desktops to the user agents, but all other patterns remained the same, so I successfully blocked it.

After that, the bot owner did not change the behavior of the bots, and after some time (a week or two) the bots stopped trying to enter the site, I deleted the blocking rules.

For further analysis

The command for filtering requests from the specified subnet (185.176.240/24), which have a response code of 200 (that is, not blocked) – useful in case bots change the User Agent:

cat site.ru/logs/access_log | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | grep '185.176.24' | grep ' 200 ' | tail -n 10

A variant of the command for compiling a list of IP addresses given at the beginning of this article, but only requests with a response code 200 are taken into account in the command (those that we have already blocked are filtered out):

cat site.ru/logs/access_log | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | grep ' 200 ' | awk '{ print $1 }' | sort | uniq -c

Command for monitoring the latest requests specific to bots:

cat site.ru/logs/access_log | grep '"-"' | grep -E -i 'android|iPhone' | grep -i -E -v 'google|yandex|petalbot' | tail -n 10

How the influx of bots affects the site

This time, I reacted pretty quickly – a day after the attack started. But the last time bots walked around my site for a couple of weeks before I got tired of it. This did not have any impact on the position of the site in the search results.

How to block by Referer, User Agent, URL, query string, IP and their combinations in mod_rewrite

As part of the fight against the influx of bots to the site (see the screenshot above), I had to refresh my knowledge of mod_rewrite. Below are examples of mod_rewrite rules that allow you to perform certain actions (such as blocking) for users who meet a large number of criteria at once – see the most recent example to see how flexible and powerful mod_rewrite is.

See also: How to protect my website from bots

Denying access with an empty referrer (Referer)

The following rule will deny access to all requests in which the HTTP Referer header is not set (in Apache logs, "-" is written instead of the Referer line):

RewriteEngine	on
RewriteCond	%{HTTP_REFERER}	^$
RewriteRule	^.*	-	[F,L]

Blocking access on the part of the user agent

When blocking bots by User Agent, it is not necessary to specify the full name – you can specify only part of the User Agent string to match. Special characters and spaces must be escaped.

For example, the following rule will block access for all users whose User Agent string contains “Android 10”:

RewriteEngine	on
RewriteCond	%{HTTP_USER_AGENT}	"Android\ 10"
RewriteRule	^.*	-	[F,L]

Examples of User Agents blocked by this rule:

  • Mozilla/5.0 (Linux; Android 10; SM-G970F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Mobile Safari/537.36
  • Mozilla/5.0 (Linux; Android 10; Redmi Note 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Mobile Safari/537.36

How to block access by exact match User Agent

If you need to block access to the site by a certain User Agent with an exact match of the name, then use the If construct (this does not apply to mod_rewrite, but do not forget about this possibility):

<If "%{HTTP_USER_AGENT} == 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)'">
	Require all denied
</If>

<If "%{HTTP_USER_AGENT} == 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0'">
	Require all denied
</If>

<If "%{HTTP_USER_AGENT} == 'Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Firefox/45.9.0'">
	Require all denied
</If>

If is available since Apache 2.4.

Denying access to certain pages

The %{REQUEST_URI} variable includes everything that goes after the hostname in the request (but does not include what comes after the question mark in the URL), using it you can filter requests by URL, query string, file names or parts of them. For example:

RewriteEngine	on
RewriteCond	%{REQUEST_URI}	"query-string"
RewriteRule	^.*	-	[F,L]

Despite the fact that in the logs of the Apache web server some characters, including Cyrillic, are displayed in URL encoding, you can specify Cyrillic or other letters of national alphabets in these rules. For example, the following rule will block access to an article with the URL https://site.ru/how-to-find-which-file-from/:

RewriteEngine	on
RewriteCond	%{REQUEST_URI}	"how-to-find-which-file-from"
RewriteRule	^.*	-	[F,L]

If you wish, you can specify several URLs (or their parts) at once. Each search string must be enclosed in parentheses; the parenthesized strings must be separated by | (pipe), for example:

RewriteEngine	on
RewriteCond	%{REQUEST_URI}	"(windows-player)|(how-to-find-which-file-from)|(how much-RAM)|(how-to-open-folder-with)|(7-applications-for)"
RewriteRule	^.*	-	[F,L]

Since %{REQUEST_URI} does not include what comes after the question mark in the URL, use %{QUERY_STRING} to filter by the query string that follows the question mark.

How to filter by the query string following the question mark

The %{QUERY_STRING} variable contains the query string that follows the ? (question mark) of the current request to the server.

Note that the filtered value must be URL encoded. For example, the following rule:

RewriteCond %{QUERY_STRING} "p=5373&%D0%B7%D0%B0%D0%B1%D0%BB%D0%BE%D0%BA%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D1%82%D1%8C"
RewriteRule ^.* - [F,L]

blocks access to the page https://suay.ru/?p=5373&заблокировать, but will not deny access to the page https://suay.ru/?p=5373.

Denying IP and Ranges Access

With mod_rewrite, you can block individual IPs from accessing the site:

RewriteEngine	on
RewriteCond	"%{REMOTE_ADDR}"	"84.53.229.255"
RewriteRule	^.*	-	[F,L]

You can specify multiple IP addresses to block:

RewriteEngine	on
RewriteCond	"%{REMOTE_ADDR}"	"84.53.229.255" [OR]
RewriteCond	"%{REMOTE_ADDR}"	"123.45.67.89" [OR]
RewriteCond	"%{REMOTE_ADDR}"	"122.33.44.55"
RewriteRule	^.*	-	[F,L]

You can also use ranges, but remember that in this case, strings are treated as regular expressions, so the CIDR notation (for example, 94.25.168.0/21) is not supported.

Ranges must be specified as regular expressions – this can be done using character sets. For example, to block the following ranges

  • 94.25.168.0/21 (range 94.25.168.0 - 94.25.175.255)
  • 83.220.236.0/22 (range 83.220.236.0 - 83.220.239.255)
  • 31.173.80.0/21 (range 31.173.80.0 - 31.173.87.255)
  • 213.87.160.0/22 (range 213.87.160.0 - 213.87.163.255)
  • 178.176.72.0/21 (range 178.176.72.0 - 178.176.75.255)

the rule will work:

RewriteEngine	on
RewriteCond	"%{REMOTE_ADDR}"	"((94\.25\.1[6-7]])|(83\.220\.23[6-9])|(31\.173\.8[0-7])|(213\.87\.16[0-3])|(178\.176\.7[2-5]))"
RewriteRule	^.*	-	[F,L]

Note that the range 94.25.168.0 - 94.25.175.255 cannot be written as 94.25.1[68-75], it will be interpreted as the string “94.25.1” and a character set including character 6, range 8-7 and character 5. Due to the range of 8-7, this entry will cause an error on the server.

Therefore, to write 94.25.168.0 - 94.25.175.255, “94\.25\.1[6-7]” is used. Yes, this record does not accurately convey the original range – to increase the precision, you can complicate the regular expression. But in my case, this is a temporary hotfix, so it will do just that.

Also note that the last octet 0-255 can be skipped, since part of the IP address is enough to match the regular expression.

Combining access control rules

Task: block users who meet ALL of the following criteria at once:

1. Empty referrer

2. The user agent contains the string “Android 10”

3. Access was made to a page whose URL contains any of the strings

  • windows-player
  • how-to-find-which-file-from
  • how much-RAM
  • how-to-open-folder-with
  • 7-applications-for

4. The user has an IP address belonging to any of the ranges:

  • 94.25.168.0/21 (range 94.25.168.0 - 94.25.175.255)
  • 83.220.236.0/22 (range 83.220.236.0 - 83.220.239.255)
  • 31.173.80.0/21 (range 31.173.80.0 - 31.173.87.255)
  • 213.87.160.0/22 (range 213.87.160.0 - 213.87.163.255)
  • 178.176.72.0/21 (range 178.176.72.0 - 178.176.75.255)

The following set of rules will match the specified task:

RewriteEngine	on
RewriteCond	"%{REMOTE_ADDR}"	"((94.25.1[6-7]])|(83.220.23[6-9])|(31.173.8[0-7])|(213.87.16[0-3])|(178.176.7[2-5]))"
RewriteCond	%{HTTP_REFERER}	^$
RewriteCond	%{HTTP_USER_AGENT}	"Android\ 10"
RewriteCond	%{REQUEST_URI}	"(windows-player)|(how-to-find-which-file-from)|(how much-RAM)|(how-to-open-folder-with)|(7-applications-for)"
RewriteRule	^.*	-	[F,L]

Please note that rules that are logical OR must be collected into one large rule. That is, you cannot use the [OR] flag with any of the rules, otherwise it will break the logic of the entire rule set.

By the way, I overcame the bots.

Redirect to HTTPS not working in WordPress

This is not an obvious problem, because for some pages the redirect to HTTPS works, but for some it does not. I ran into this problem on WordPress quite by accident. Therefore, if you are a webmaster with WordPress sites, then I would recommend that you check your sites too.

Redirecting from HTTP to HTTPS is quite simple, you need to add the following lines to the .htaccess file:

RewriteEngine on
RewriteCond %{HTTPS} !on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI}

This is how I redirect to HTTPS on most of my sites.

To test the redirect to HTTPS, it is advisable to look at the HTTP response headers, since web browsers tend to open the site over HTTPS even if you explicitly specified the HTTP protocol in the URL, at least I noticed this with pages already opened over HTTPS.

In Linux, the response HTTP headers can be viewed with a command of the form (it will show both the headers and the response body):

curl -v 'URL'

And this command will show only headers:

curl -I 'URL'

If you run Windows, then you can use an online service to display HTTP headers.

We enter the site address http://site.ru/

Received HTTP redirect code:

HTTP/1.1 302 Found

We were redirected to the HTTPS version:

Location: https://site.ru/

Is everything working as it should?

We continue to check. We enter the site address http://site.ru/page-on-site

And… we get the code 200, that is, the page would be shown to us at the specified address, without redirecting to HTTPS.

This behavior can be observed on sites with beautiful (sometimes referred to as SEO) page URLs. In WordPress, this can be selected in Control Panel → Settings → Permalinks. Examples:

 Day and name	https://suay.site/2021/05/21/sample-post/
 Month and name	https://suay.site/2021/05/sample-post/
 Numeric	https://suay.site/archives/123
 Post name	https://suay.site/sample-post/

The point is that in order for any of these options to work, WordPress adds the following lines to the .htaccess file:

# BEGIN WordPress
# Директивы (строки) между `BEGIN WordPress` и `END WordPress`
# созданы автоматически и подлежат изменению только через фильтры WordPress.
# Сделанные вручную изменения между этими маркерами будут перезаписаны.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

These lines contain conditions and a rule for mod_rewrite with the [L] flag, which means that the check should be aborted according to the mod_rewrite rules. As a result, the HTTP to HTTPS redirect rule does not reach the queue.

That is, the redirect lines must be placed before the fragment that is generated by WordPress. Let's try:

Found
The document has moved here.

Additionally, a 302 Found error was encountered while trying to use an ErrorDocument to handle the request.

The situation has changed but has not improved.

It is necessary to add the [L] flag to the rewrite rule, and place these rules in the .htaccess file before the fragment from WordPress:

RewriteEngine on
RewriteCond %{HTTPS} !on
RewriteCond %{REQUEST_URI} !^/.well-known/
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L]

# BEGIN WordPress
# Директивы (строки) между `BEGIN WordPress` и `END WordPress`
# созданы автоматически и подлежат изменению только через фильтры WordPress.
# Сделанные вручную изменения между этими маркерами будут перезаписаны.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

After that, everything will work exactly as you expect. All URLs, both the Front Page and other posts, starting with http:// will be redirected to https://

By default, the code will be “302 Moved Temporarily”. If you wish, you can select the code “301 Moved Permanently”:

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Loading...
X