Fast Filtering with ripgrep

ripgrep is a recent entry in a long list of grep-like tools, which has gained popularity and has become a defacto dependency among several popular neovim plugins. For those unfamiliar with it, grep is a command-line utility that has become the standard tool for matching lines in a text files with regular expressions.

ripgrep is most commonly credited with being much faster than grep and other similar tools, which is achieved in two ways. First, ripgrep really is fast. Its github repo has some benchmarks which, while benchmarks may not always reflect actual use-cases, indicates that ripgrep is generally about 10x faster than other tools. Second, by default ripgrep reduces the number of files it has to match against by respecting .gitignore files as well as by automatically skipping hidden files and directories, which can significantly reduce the workload when working with large code bases.

Call Signature

First, although the command is called ripgrep, when it is called from the command-line it is shortened to simply rg.

When ripgrep is called directly from the command-line it follows the call signature:

rg [OPTIONS] PATTERN [PATH ...]

which changes slightly when it is called from within a pipeline, due to the input coming from stdin:

command | rg [OPTIONS] PATTERN

The primary difference between the two is that the former requires a PATH argument that defines the content that is to be searched, while the latter the output from command. We will see each of these in more detail shortly.

The PATH Argument

The PATH argument defines a file or a directory to search, where directories will be searched recursively. Paths passed on the command line take precedence over other rules, such as globs and .gitignore files.

Let's see how this works, using the fruits.txt file we have used in previous examples:

apple
banana
watermelon
grape
strawberry

As a first step, let's use the pattern . to match everything, and apply that pattern to our input file. Following our call signature:

·
·
·
ninja$:·rg·.·fruits.txt
1:apple
2:banana
3:watermelon
4:grape
5:strawberry
ninja$:··

As expected, ripgrep returned each line in our file. Now, let's see how we can use patterns to filter lines.

The PATTERN Argument

The PATTERN argument defines the regular expression to be used for searching, where ripgrep's regular expression syntax is discussed in the regex syntax section.

To see how this works, let's search this file looking for a simple pattern - let's select any lines that contain an a followed by either n or p. There are several ways we can build this pattern, but let's use a simple character class to do so. (Don't worry if you don't yet understand character classes, we will discuss them in detail in the next section):

·
·
·
·
·
ninja$:·rg·a[pn]·fruits.txt
1:apple
2:banana
4:grape
ninja$:··

Note that the output only contains that matched our pattern, and ignored those that don't. This demonstrates the basic function of ripgrep.

ripgrep can also search for multiple patterns at the same time. This can be achieved a few ways, but one of the direct methods is to take advantage of the -e/--regexp option, which allows multiple patterns to be specified in the ripgrep invocation.

To demonstrate, let's add a second pattern that selects only lines containing 2 or more consecutive rs:

·
·
·
·
ninja$:·rg·--e=a[pn]·--e=r{2,}·fruits.txt
1:apple
2:banana
4:grape
5:strawberry
ninja$:··

At this point we are filtering out just a single line, so let's see one more example where we achieve the same result, but a bit more directly. In the next example, let's use a pattern that matches only lines that do not start with the letter w:

·
·
·
·
ninja$:·rg·^[^w]·fruits.txt
1:apple
2:banana
4:grape
5:strawberry
ninja$:··

We have now seen how to call ripgrep, and we have see a few simple regular expressions that allow us to select only lines of interest from the input file. In the next chapter we will learn more about the rules of constructing regular expression patterns themselves.

Pattern Files

Before we leave the topic of defining the pattern to search for, there is one more topic we would like to discuss. While we often use ripgrep for quick, one-off searches, there are some searches that we run multiple periodically to perform specific tasks, and typing the same patterns in the command line each time can be repetitive and error-prone, especially when working with more complicated patterns. In many cases we can write a shell script, but ripgrep offers another options that can be very useful in these situations. ripgrep provides a -f/--file option that tells it to look for patterns in the specified file(s), allowing us to effectively name patterns that we use often then pass them to ripgrep by name.

Let's take a look at how this works. First, we define our pattern and save it to a file. Pattern files can contain one or more patterns, with each pattern defined on a separate line.

Warning

Be careful not to leave any blank lines in the files, however, as ripgrep interprets a blank line as "match all input", which is generally not what is intended.

The pattern file option can be passed multiple times, in which case ripgrep will search the input for all patterns defined in all specified pattern files, and any input that matches any pattern will be printed to the output. Finally, PATTERNFILE can also be specified as -, in which case ripgrep will read patterns from stdin allowing patterns to be dynamically generated, which opens up some interesting possibilities.

Back to our example, although pattern files are most useful when working with complicated patterns, so keep our example simple we want to select any lines that have exactly two adjacent a, b, or p letters:

·
·
·
·
·
·
·
ninja$:·cat·pattern.txt
[apb]{2}
ninja$:··

Next, let's take a look at the file we want to search:

·
·
·
ninja$:·cat·fruits.txt
apple
banana
watermelon
grape
strawberry
ninja$:··

Now, let's execute the command and check the results:

·
·
·
·
·
ninja$:·rg·--file=pattern.txt·fruits.txt
1:apple
2:banana
4:grape
ninja$:··

which, as expected, filtered the input over the pattern defined in the pattern file.

Piping Input

Before we leave this chapter, let's look a bit deeper into piping output from commands into ripgrep. This time, let's use cat to concatenate two files, then pipe them to ripgrep to filter them. To start, let's dump the concatenated files to the console to see the input that will be sent to ripgrep:

ninja$:·cat·animals.txt·fruits.txt
bear
chicken
duck
apple
banana
watermelon
grape
strawberry
ninja$:··

Now, let's apply a simple pattern to filter the lines. This time, let's see which line contain an e preceded by on of b, l, or p:

·
·
·
·
ninja$:·cat·animals.txt·fruits.txt·|·rg·[blp]e
bear
apple
grape
strawberry
ninja$:··

Notice that unlike previous examples with rg, there are no line numbers. By default, ripgrep shows line numbers when invoked directly from the console, but does not add line numbers when invoked with input from stdin. Line numbers can still be added by adding the --line-number option, they just aren't enabled by default in this case because we piped input from cat to ripgrep.