Loading...
X

Regular Expressions in Writer (LibreOffice): A Complete Guide

Table of contents

1. How does search using regular expressions differ from regular search? Why use regular expressions for search and replace

2. Features of regular expression search Writer (LibreOffice)

3. How to enable regular expression search in Writer (LibreOffice)

4. Writer (LibreOffice) regular expression syntax

4.1 Literal meaning of symbols

4.2 Any character

4.3 Character ranges

4.4 Specifying the number of repetitions for one character

4.5 Grouping of characters

4.6 Specifying the beginning and end of a line

4.7 Any characters other than those specified

4.8 Alternative choice

4.9 Case-sensitive regular expression search

5. Advanced regular expression syntax in Writer (LibreOffice)

5.1 Backreferences

5.2 Example of using backreferences in the “Find” field

5.3 POSIX character classes: [:alpha:] [:digit:] and so on

5.4 How to use POSIX character classes with ranges

5.5 Tabs, new lines, paragraphs \t \n $

6. LibreOffice regular expression examples

7. How to search and replace across multiple paragraphs

8. Difference between Writer (LibreOffice) regular expressions and MS Word wildcards


1. How does search using regular expressions differ from regular search? Why use regular expressions for search and replace

Searching using regular expressions is primarily characterized by the fact that you can specify not the exact string, but its pattern as the search string. For example, you can specify the pattern “four numbers in a row” or “three identical letters in a row” or “numbers at the end of a word not separated by a space from the letters” and much more.

Regular expression syntax allows you to specify special characters that may not be available for normal searching, such as soft line breaks, tabs, the beginning and end of a line, and so on.

Finally, regular expression search allows you to specify additional search conditions, for example:

  • search at the beginning of the line
  • search at the end of the line

All this (pattern search, search for special characters, search with additional conditions) can be combined in one regular expression, thereby achieving both search flexibility (we can find a string even without knowing its exact contents) and at the same time search accuracy (we can find a line that comes after a word of six letters or before a word with no more than ten letters).

Another function of regular expressions is the ability to use a string found from a pattern when replacing. That is, you can as search and replace data, as well modify it according to the created template (pattern).

2. Features of regular expression search Writer (LibreOffice)

If you are already familiar with regular expressions (in programming languages), then in general Writer (LibreOffice) regular expressions are very similar to all other implementations.

But there is one important difference: Writer (LibreOffice) regular expression searches are always performed within a single paragraph. There are characters that indicate “beginning of line” and “end of line” that you can use in a pattern, but the pattern will always be checked within a single paragraph.

There are characters that mean “end of paragraph” and there are a combination of characters that mean “empty paragraph”, but you can't combine them with other regular expression syntax.

However, it is possible to bypass this limitation and below we will show you exactly how to do this.

And there is another very important difference in searching using regular expressions in Writer (LibreOffice): by default, the search is performed in a case-insensitive manner. This applies to the parts of the pattern that are interpreted literally (letters) as well as to character ranges. That is, you can specify a range of “capital letters only”, but if you do not check the “Match case” checkbox, then small letters will also be considered to match this range.

The third difference between LibreOffice regular expressions and many other programming languages is that after specifying character ranges in square brackets, it is not necessary to specify the number of repetitions. By default, any one character from the range is assumed.

3. How to enable regular expression search in Writer (LibreOffice)

To search text using regular expressions, select “Edit” → “Find and Replace...” from the menu.

Note: Even if you just want to search without replacing, you need to go to the “Find and Replace...” section. This section allows you to both search with replacement and search without replacement.

Check the “Regular expressions” box.

Now everything that you enter into the “Find” field will be regarded as a regular expression.

Note: Anything entered in the “Replace” field will also be interpreted as a regular expression. But some characters in the “Find” and “Replace” fields will be treated differently.

4. Writer (LibreOffice) regular expression syntax

4.1 Literal meaning of symbols

In regular expressions, characters can be interpreted in two ways:

  • literally
  • have a special meaning

Letters, numbers, spaces, and some punctuation marks are interpreted literally (except when they are used to indicate character sets or character ranges). In short, all symbols except meta symbols are treated literally.

The following meta characters have special meaning:

.
*
+
?
^
$
|
\
[
]
{
}
(
)
- (in case of specifying ranges)

If you want a special character to be treated literally, for example, you need to look for “?” (question mark) in the text, then in order for this character to be interpreted literally, put a backslash in front of it, for example:

\?

Another example is searching for the literal “.” (dot):

\.

4.2 Any character

. (dot) means “any one character”

That is, in the regular expression “.” (dot) stands for any character that appears once.

For example, the following regular expression pattern (two dots are used, which means “any two characters”):

special..

will find words like

especially
nonspecialist
overspecialisation
specialisation
specialise
etc.

Let's look at a few more examples. Regular expression to search

o.o.o

means the letter “o” followed by any symbol, then the letter “o” again, then any symbol again and then the letter “o” again, will find the following words:

anthropology
autonomous
bonobo
chloroform
chocoholic
chromosome

You can find words with four letters “o”, this template will help you

o.o.o.o

which will find the following words:

homologous
monotonous

Next pattern:

a.a.a

will find words with three letters “a”:

abracadabra
adamant
Adana
Ahmadabad
Alabama

It is not necessary to use the same letters – make up expressions to suit your tasks.

For example, to find words in which the first letter “z” appears (no matter at the beginning of the word or in the middle), then any three other characters, and then the letter “k” and again any character, you can use the following search pattern

z...k.

This regular expression will find words like:

Brzezinski
mazourka
zincked

4.3 Character ranges

[...] means any single occurrence of any of the characters enclosed in square brackets.

For example, the regular expression

[abc123]

matches the characters “a”, “b”, “c”, “1”, “2” and “3”.

The following regular expression:

[a-e]

matches single occurrences of the characters “a” through “e”, inclusive (the range must be the first of the characters having the lowest Unicode code number).

The following regular expression:

[a-eh-x]

matches any single occurrence of characters in the ranges “a” to “e” and “h” to “x”.

Let's look at using character ranges in regular expressions. You can specify a set of characters in square brackets, any of which can occupy the specified space.

For example search pattern

c[ao]mpa[igny]

will find all the words

campaign
company

You can specify any number in a range of characters:

t[vprgaoe]n

The previous pattern will match words like:

absoluteness
abstention
acceptance
accountancy

Character ranges can be specified using a hyphen, for example:

[0-9]

In this case, all characters from 0 to 9 will be found.

You can also specify a range of letters:

[A-Z]

But you need to be careful with letter ranges, since intuitive expectations may not exactly match the results. First of all, remember that if the “Match case” option is not enabled, then character ranges that contain letters in one case will look for letters in both cases.

Non-English character ranges may not always produce the expected results. This is not a bug – it is determined by the rules for the sequence and sorting of letters for different locales. We will not go deep into this topic, just remember this feature of ranges. To avoid this problem, you can simply list all the characters:

[ABCDEFGHIJKLMNOPQRSTUVWXYZ]
[abcdefghijklmnopqrstuvwxyz]
[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ]
[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]

But remember that case-insensitive search is enabled by default. Even if you explicitly specify only uppercase letters, such as ABCDEFGHIJKLMNOPQRSTUVWXYZ, lowercase letters will also be found by default.

Character ranges can be combined with each other:

[A-Za-z0-9]

Note: If you are familiar with character ranges in regular expressions from other programming languages, then you may remember that in some implementations of regular expression search (for example, in PHP), after the range you must specify the number of repetitions of characters from this range. In LibreOffice, after a range it is not necessary to indicate the number of repetitions of a character – by default, one character from the range is assumed. But you can specify the number of characters from the range – this will be discussed below.

4.4 Specifying the number of repetitions for one character

A range of characters means that any character from that range must occur, but only one of them and only once. You can specify the number of times a symbol should appear. There are several ways to specify the number of characters.

1) Range of number of characters

To indicate a quantity range, use curly braces with numbers:

{m,n}

This syntax means that the previous character must occur m to n times.

For example, the following rule means that the letter “o” must appear 3 to 5 times in a row:

o{3,5}

Example of use inside a regular expression:

overco{3,5}me

This regular expression will look for a line that:

  • the first part is the literal string “overco
  • followed by the letter “o” 3 to 5 times
  • followed by the literal string “me

The following syntax options are also valid:

{n}

means that the character must be repeated exactly n times.

{,n}

means that the character must be repeated no more than n times (zero to n times).

{m,}

means that the symbol must be repeated at least m times (from m to infinity times).

You can combine specifying character sets with specifying ranges of how many times they should be repeated, for example:

t[vprgaoe]{3,}n

This pattern means:

  • followed by a literal “t
  • then follows from 3 to infinity of any of the characters listed: “vprgaoe
  • followed by a literal “n

The following words correspond to this range:

Actaeon
adulterant
antagonise
contravene

2) Meta characters indicating the number of characters

The following meta characters are available to indicate the number of repetitions of the previous character:

+ (plus sign) means the number of repetitions is 1 or more.

For example, “AX.+4” will find “AXx4”, but will not find “AX4”.

Please note that if a pattern can match a longer line and a shorter one in the same paragraph, the longer one will always be selected.

Example search paragraph:

wwAXx4mmmmmmmmm4

In this paragraph, the pattern “AX.+4” matches the line:

AXx4

Also, the pattern “AX.+4” matches the following string:

AXx4mmmmmmmmm4

So, in LibreOffice the longer line will always be highlighted, that is, in this case it is the second option.

Note: If you are familiar with the terms “greedy” and “lazy” quantifiers in regular expressions, then in LibreOffice regular expressions (quantifiers) are greedy.

? (question mark) means zero or one repetition of the character that precedes it.

For example, “Texts?” matches “Text” and “Texts”; and the pattern “x(ab|c)?y” will match “xy”, “xaby” or “xcy”.

* (asterisk) means zero or more characters that come before it.

For example, “Ab*c” will match “Ac”, “Abc”, “Abbc”, “Abbbc” and so on.

4.5 Grouping of characters

The above shows how to specify the number of repetitions for one character. But what if you need to specify the number of repetitions for a sequence (string) of characters?

To do this, grouping of characters is used. Place the characters or string for which you want to specify the number of repetitions in parentheses, for example:

(аb){3}

This pattern means repeating the string “ab” three times. That is, this pattern will correspond to the following line:

аbаbаb

You can also use character grouping with the already familiar “*”, “+” and “?” operators. For example regular expression

a(bc)?d

will find the following lines:

ad
abcd

And the regular expression

M(iss){2}ippi

will find the string

Mississippi

Note 1: If you need to find a string containing parentheses, then escape them in a regular expression pattern:

8-\([0-9]{3}\)-[0-9]{7}

The previous regular expression will find phone numbers, for example:

8-(905)-1437628

Note 2: grouping of characters is also used in regular expressions for other purposes: 1) to indicate an alternative; 2) for backlinks. These ways of using character grouping will be discussed below.

4.6 Specifying the beginning and end of a line

^ indicates the beginning of a paragraph or cell.

Special objects, such as empty margins or character-bound frames, are ignored at the beginning of a paragraph. Example: “^Peter” matches the word “Peter” only if it is the first word of the paragraph.

$ means the end of a paragraph or cell.

Special objects, such as empty margins or frames bound to characters at the end of a paragraph, are ignored. Example: “Peter$” matches only when the word “Peter” is the last word of the paragraph. Please note that the word “Peter” cannot be followed by a period.

The single $ symbol used alone corresponds to the end of a paragraph. This way you can search for and replace paragraph breaks.

You can use the following regular expression to find empty paragraphs:

^$

4.7 Any characters other than those specified

[^...] any single occurrence of a character (including tab, space, and line break characters) that is not in the list of listed characters or ranges is allowed.

For example, the regular expression pattern

[^a-syz]

matches all characters not in the range “a” through “s” or the characters “y” and “z”.

The following regular expression will find individual words separated by a space:

[^ ]+[ ]

The meaning of the symbols is as follows:

  • [^ ]+ – means anything except a space, and it must be at least one or more characters
  • [ ] – one space

4.8 Alternative choice

| (pipe) is an infix operator that separates alternatives.

Matches the term preceding “|” or the term after “|”. For example, “this|that” matches both occurrences of “this” and “that”.

Operator “|” divides the regular into two parts, each of which is alternative. Alternatives can be limited by parentheses. For example, the following regular expression

What is (this|that) there\?

Will match the lines:

What is this there?
What is that there?

If no parentheses were used, such as in the following regular expression:

What is this|that there\?

Then this pattern will match the lines:

What is this
that there?

An alternative could be to omit any characters, for example:

a(|b)c

The above regular expression will match the lines:

abc
ac

4.9 Case-sensitive regular expression search

By default, Writer (LibreOffice) searches, including regular expressions, are case insensitive.

Note: This is in contrast to most other regular expression search implementations, which have case-sensitive search enabled by default.

To make Regular Expression searches in Writer (LibreOffice) case sensitive, go to “Edit” → “Find and Replace” and check the “Match case” checkbox.

5. Advanced regular expression syntax in Writer (LibreOffice)

5.1 Backreferences

Backreferences are part of a substring found by a pattern that can be used to replace and/or search for similar elements.

To create a backreference, place part of the template in parentheses.

To access a backreference in the “Search” field, use a construction like “\n” (for example, “\1”, “\2”, “\3”).

To refer to a backreference in the “Replace” field, use a construction of the form “$n” (for example, “$1”, “$2”, “$3”).

For backreferences there are also the following meta characters:

$0 inserts all found text.
& also inserts all found text.

Example of using backreferences in Writer (LibreOffice)

Let's consider an example: the text contains dates of the form DD.MM.YYYY and they need to be replaced with dates of the form MM/DD/YY. For example, the date “06/08/1983” should be replaced with “06/08/83”.

This can be done using the following regular expression in the “Find” field:

([0-9]{2})\.([0-9]{2})\.[0-9]{2}([0-9]{2})

In the "Replace" field you need to enter the following:

$2/$1/$3

Explanation of regular expression elements:

  • ([0-9]{2}) – means any two digits. This part of the regular expression is placed in parentheses, so the found substring will be assigned to the first backreference.
  • \. – means literal dot symbol
  • ([0-9]{2}) – this part is identical to the first part. But since the parentheses are used a second time, the found substring will be assigned to the second backreference.
  • \. – again means a literal dot symbol
  • [0-9]{2} – means any two digits, but parentheses are not used. That is, this part of the search result will not be included in backreferences.
  • ([0-9]{2}) – this part is identical to the first and second parts. But since the parentheses are used a third time, the found substring will be assigned to the third backreference.

Explanation of the elements of the “Replace” field:

  • $2 – second backreference
  • / – literal slash character
  • $1 – first backreference
  • $3 – third backreference

5.2 Example of using backreferences in the “Find” field

The following regular expression will find any three identical consecutive letters or numbers:

([A-Za-z0-9]{1})\1\1

Explanation of parts of the regular expression:

  • ([A-Za-z0-9]{1}) – means any one letter or number. The parentheses mean that the letter or number found will be placed in the first backreference
  • \1 – means the first backreference. That is, the same letter or number that is found by the first part of the regular expression

5.3 POSIX character classes: [:alpha:] [:digit:] and so on

Regular expressions can use POSIX character class specifications in the form [:class_name:], which can be matched against any character of that class. For example, [:digit:] represents any of the digits 0123456789.

The available POSIX parenthetical expressions are listed below. Note that the exact definition of each depends on the locale - for example, in another language, other characters might be considered "alphabetic letters" in [:alpha:]. The values​​given here generally apply to English-speaking regions (and do not account for Unicode issues).

[:alpha:] Any alphabetic character.
[:digit:] Any digit.
[:alnum:] Represents an alphanumeric character ([:alpha:] and [:digit:]).
[:space:] Represents the space character (but not other space characters).
[:print:] Represents a printable character.
[:cntrl:] Represents a non-printing character.
[:lower:] Represents a lowercase character when the “Match case” option is selected in the options.
[:upper:] Represents an uppercase character when the “Match case” option is selected in the options.

5.4 How to use POSIX character classes with ranges

Because POSIX ranges and character classes use square brackets, it can be confusing if you want to use POSIX character classes within a range.

You can put a POSIX character class along with square brackets inside the range brackets.

The following example searches for all [:upper:] characters (uppercase characters) as well as the letter “a”.

[[:upper:]a]

5.5 Tabs, new lines, paragraphs \t \n $

The “\t” character pair has a special meaning – it represents a tab character.

For example:

\tred

will match a tab character followed by the word “red”.

A pair of “\n” characters in Writer in the “Find” field means Shift-Enter (Line Feed / line break). This is not the same as a newline character or the end of a paragraph. This is very different from many other regular expression search implementations. This is partly because regular expressions in other programs typically operate on plain text, whereas LibreOffice's regular expressions divide text by paragraph marks.

Please note that in the “Replace” field, the “\n” symbols indicate the end of a line (paragraph).

The $ symbol by itself (when used without other symbols) means the end of a paragraph.

6. LibreOffice regular expression examples

Blank paragraph:

^$

  • ^ indicates that the match must be at the beginning of the paragraph,
  • $ specifies that a paragraph mark or end of cell should follow the matched line.

First character of paragraph:

^.

  • ^ specifies that the match must be at the start of a paragraph,
  • . specifies any single character.

Matches “e” by itself or an “e” followed by one digit.

e([:digit:])?
  • e specifies the character "e",
  • [:digit:] specifies any decimal digit,
  • ? specifies zero or one occurrences of [:digit:].

The following regular expression matches a paragraph or cells containing exactly one digit.

^([:digit:])$
  • ^ specifies that the match must be at the start of a paragraph,
  • [:digit:] specifies any decimal digit,
  • $ specifies that a paragraph mark or the end of a cell must follow the matched string.

The following regular expression matches a paragraph or cell containing only three-digit numbers:

^[:digit:]{3}$
  • ^ specifies that the match must be at the start of a paragraph,
  • [:digit:] specifies any decimal digit,
  • {3} specifies that [:digit:] must occur three times,
  • $ specifies that a paragraph mark or the end of a cell must follow the matched string.

The following regular expression matches the words “constitution” and “construction”, but not the word “constitutional”.

\bconst(itu|ruc)tion\b
  • \b specifies that the match must begin at a word boundary,
  • const specifies the characters “const”,
  • ( starts the group,
  • itu specifies the characters “itu”,
  • | specifies the alternative,
  • ruc specifies the characters “ruc”,
  • ) ends the group,
  • tion specifies the characters “tion”,
  • \b specifies that the match must end at a word boundary.

7. How to search and replace across multiple paragraphs

LibreOffice searches strictly within single paragraphs and this behavior cannot be changed by settings and options.

However, it is possible to partially bypass this limitation.

The algorithm of actions is as follows:

  1. Find and replace all paragraph separators with a unique character or string. This will result in the text being combined into a single paragraph.
  2. Performing a search or replace using a regular expression in which the paragraph separator character is replaced with the unique character or string selected in the first paragraph
  3. If necessary, we reverse convert the unique character into a paragraph separator character

Note: Before acting on the text, save the original text or file in case something goes wrong.

So, let's say we have the following text:

Group 1
Emily Johnson
Benjamin Parker

Chloe Williams
Lucas Thompson
Olivia Davis
Ethan Roberts


Group 2
Isabella White
Jacob Wilson
Sophia Mitchell
Alexander Harris


Group 3
Ava Miller
Liam Anderson
Mia Martinez
Noah Clark
Harper Scott

James Rodriguez
Charlotte Young
Logan Wright

Amelia Brown
Mason Green

Task: replace three consecutive newline characters with a horizontal bar “---------------------------------”.

That is, in those places in the text where the “Enter” key was pressed three times while entering, you need to insert the specified horizontal line. In the same places where the “Enter” key was pressed once or twice, you do not need to do anything.

Let's look at step-by-step steps on how to search and replace across multiple paragraphs in LibreOffice.

1. Go to “Edit” → “Find and Replace”.

In the “Find” field, enter:

$

In the “Replace” field, enter:

Check the “Regular expressions” box and click the “Replace All” button.

The following result will be obtained:

2. Go to “Edit” → “Find and Replace”.

In the “Find” field, enter:

╋ ╋ ╋

In the “Replace” field, enter:

\n---------------------------------\n

Check the “Regular expressions” box and click the “Replace All” button.

The following result will be obtained:

3. Go to “Edit” → “Find and Replace”.

In the “Find” field, enter:

In the “Replace” field, enter:

\n

Check the “Regular expressions” box and click the “Replace All” button.

The following result will be obtained:

So, where there were three consecutive newline characters, a horizontal bar was inserted, the remaining newline characters were left unchanged – exactly what was required by the condition.

8. Difference between Writer (LibreOffice) regular expressions and MS Word wildcards

If you compare regular expression search in Writer (LibreOffice) and MS Word, there is a significant difference, primarily because the implementation of regular expression search in MS Word (called Wildcard search) differs from programming languages.


Leave Your Observation

Your email address will not be published. Required fields are marked *