Now that we have learned the basics of working with ripgrep, let's start looking a bit deeper at how to use ripgrep effectively. Each time ripgrep is called it goes through a few basic steps:
- Select files to be searched
- Apply the specified pattern(s) to each in the specified input(s)
- Format and return each line of output
In this chapter we look at step #1 more closely.
Smart Filtering
By default, ripgrep's "smart filtering" algorithm is applied, which collects inputs by searching each specified file and/or recursively searching through the specified directories. During this process, by default ripgrep:
- ignores any files and directories that are defined in your
.gitignore/.ignore
file(s), - ignores any hidden files and directories, and
- ignores any binary files
These are sensible defaults and are pretty close to what we want in many cases, but there are some cases where we want different behavior, and ripgrep provides a variety of options that allow us to modify that behavior.
The first option is to use the -u/--unrestricted
option to reduce the level of default, or
"smart", filtering. This option can be repeated up to 3 times in order to define the types of
filtering that are desired, and each time it is called one of the 3 filter types above is disabled:
- Applying this option once disables filtering according to your
.gitignore/.ignore
file(s), - Applying this option a second time disables the hidden file filter, and finally
- Apply this option a third time disables the binary file filter.
This option is convenient, but can be a bit of a blunt instrument in some cases since, for example,
it requires that the .gitignore/.ignore
filter be disabled in order to search hidden files, which is often
not what we need to do. As one might expect, ripgrep allows each filter to be enabled and disabled
directly.
Ignoring .gitignore/.ignore
The --ignore
and --no-ignore
flags can be used to directly control whether or not to filter
inputs according to .gitignore/.ignore
. ripgrep actually provides even greater control over how it
selects and treats .gitignore/.ignore
files, which we will get into a bit later.
However, it is helpful to note that when both .gitignore
and .ignore
are present and not
ignored, the rules in .gitignore
are read first, then .ignore
is read. This means that rules in
.ignore
take precedence over those defined in .gitignore
.
Hidden Files and Directories
The --hidden
and --no-hidden
flags can be used to directly control how ripgrep is to treat
hidden files and directories.
Binary Files
The --binary
and --no-binary
flags can be used to directly control how ripgrep treats binary
files. However, there is a bit of nuance here since, after all, every file is technical a "binary"
file.
By default, ripgrep deems a file to be "binary" if it encounters a NUL
byte while parsing it.
If encountered, ripgrep will throw away matches that may have occurred in the file, print a
warning to the console, and return. However, when the --binary
flag is used, if ripgrep
encounters a NUL
byte it continues searching until either a match is found, at which point it
prints a warning to the console, ignores the match, and returns, or it reaches the end of the file.
ripgrep's --text
and --no-text
flags can also be used to disable NUL
byte handling and
search binary files as though they were text files. This can be a bit dangerous, since binary
contents may be printed to the console if there is a match, which may cause escape codes to be
printed that can cause problems in your terminal emulator.
In general, we recommend just using ripgrep's default setting unless you have a specific reason not to do so.
Traversing Directories
ripgrep gives us a few additional tools that we can use to define how we traverse directories searching for files for input.
To start, when ripgrep searches a directory and finds sub-directories, then searches those directories. In this case, we can say that the search has proceeded to a depth of "1", meaning the directory level below that in which the search started. It those directories contain even "deeper" sub-directories then ripgrep will proceed to depths of "2", "3", and maybe more before it finds the "bottom". As you might imagine this searching can take time, so in some cases it makes sense limit how "deep" ripgrep will descend into the directory tree.
The -d/--max-depth
option provides exactly this functionality, allowing the maximum depth to
which it should descend to be defined in order to avoid unnecessary directory traversal. When using
this option, a depth of "0" indicates that ripgrep should only search the specified paths
themselves, a depth of "1" indicates it should search only the immediate sub-directories, and
numbers greater than 1 indicate that ripgrep should search deeper, but only up to that number
of levels sub-directories before stopping.
Directory traversal can also sometimes lead to paths that exist on other file systems, which can
significantly slow a search down due to network latency. The --one-file-system
option tells
ripgrep that it should only search for inputs on the file system from which the search began,
and simply ignore any paths that exist on other file systems. One interesting thing about this
option is that it still allows a single search to define paths that exist on different file systems,
but it will limit directory traversal for each search to its own file system. If needed, this option
can be disabled using the complementary --no-one-file-system
option.
Filtering Files by File Type
ripgrep also provides a variety of options that filter encountered files in various ways, allowing greater control over the inputs.
One of the most common options is the --type
option, which defines the file types that should be
searched, and has the call signature:
rg --type <filetype> <pattern>
where filetype
defines the file type to search, such as md
, markdown
, or txt
, and
pattern
follows the same regular expression conventions we have reviewed in previous chapters.
This option can be repeated several times to combine multiple file types in the same search.
There are also times where we want to do the opposite - we want to search for a pattern in any files except for one or two file types. ripgrep has us covered there too:
rg --type-not <filetype> <pattern>
As with --type
, this can be called multiple times to omit multiple file types.
Both --type
and --type-not
take the <filetype>
parameter, which expects the name of a
supported file type. You can list all of the file types that ripgrep supports by calling:
rg --type-list
which will print all support file types, which can be quite long. If you know what you are looking for you pipe this output back to ripgrep to filter it. For exapmle:
rg --type-list | rg markdown
will take the complete list of supported file types and filter it down to only those lines that
include markdown
. Pretty cool.
Although we won't go into detail in this chapter, we should also note that ripgrep provides options for adding
and removing file types, so that you can work with custom file types or remove some file types that
you might not want ripgrep to search. If you want to learn more about these, check the ripgrep
help screen for --type-add
and --type-clear
.
Filtering Files by File Size
It stands to reason that large files can take a long time to search. Similarly, there are also time
where we know that our target files don't exceed a certain size. In both cases, ripgrep's
--max-filesize
option can help focus the search on the right files.
This option has the call signature:
rg --max-filesize <num><suffix>?
where num
is a number, and suffix
is an optional K
, M
, or G
, corresponding to kilobytes
,
megabytes
, and gigabytes
, respectively. When no suffix
is provided, then num
is treated as
bytes
.
Following Symbolic Links
By default, ripgrep ignores symlinks while traversing directories, although can be enabled using
the --follow
option or, if it is already enabled, disabled with the --no-follow
option.
Filtering Files and Directories by Name
Last but not least, filtering files and directories by name has the highest specificity, and therefore ripgrep treats it with the highest precedence. In order words, searching by name always take precedence over other methods of filtering out files and directories, which can be a very useful feature in some situations.
ripgrep filters file and directory names using "globs", which are similar to regular expressions in that they are a means of pattern-matching, but they are focused matching file and directory names and use a syntax that is specific to that application.
Specify globs using the --glob
option, as in:
rg print --glob *.py
Notice that the glob option appears in the
glob Syntax
ripgrep's follows .gitignore
style globs, which have a few characteristics:
We can invert the glob by prefixing it with a !
, meaning that any matching files and directories
should be excluded from the input. Note that this can take some getting used to. For example,
a file that was previously-excluded but then matches an inverted pattern will become included again.
A /
marks a directory separator, which may occur at the beginning, in the middle, or at the end
of a glob. If the separator occurs at the beginning and/or middle of the pattern, then the pattern
is considered to be relative to the current working directory. Otherwise, the pattern can match at
any level below the current working directory, and can match both files and directories. On the
other hand, if the separator occurs at the end of the pattern then the pattern will only match
directories.
An asterisk *
matching anything but a slash (/
), while a ?
matches any single character
except the slash /
. globs also have limited support for simple character classes, meaning that
[a-zA-z]
can be used to match a character within the specified range.
Finally, two consecutive asterisks (**
) in patterns provides some special features:
First, globs that start with **/pattern
recursively descend through directories looking for
pattern matches at any level.
If the glob includes a trailing pattern/**
, on the other hand, everything below the specified
pattern matched.
Finally, two consecutive asterisks contained in the middle of a pattern such as left/**/right
indicate that the left side pattern should match first, then after that continue searching for the
right side pattern at any depth of sub-directories.
Case Sensitivity
ripgrep provides two options for defining the case sensitivity of the glob. First, the --iglob
option is equivalent to the --glob
option, except it indicates that the search should be done in a
case-insensitive manner.
A second option is more explicit, but also more verbose - specifying the
--glob-case-insensitive
option executes a case-insensitive search, while a case-sensitive
search can be executed by specifying the
--no-glob-case-insensitive
option.
Which option to use is more or less a matter of personal taste and style.
A Bit More about .gitignore
Before we close out this chapter, let's get back to one of the very first topics we covered -
.gitignore
files. By default, ripgrep ignores any files and directories that are listed in the
.gitignore
file, and we saw at the top of this chapter that we can use the --ignore
and
--no-ignore
options the enable and disable that feature. That is a fairly blunt operation,
however, as .gitignore
files often contain a complex list of patterns that address a variety of
files and directories that are unique to each programming language and framework that might be used
in a project. ripgrep includes some additional flexibility that can come in handy from time to
time.
First, we should repeat that the rules defined by globs have the highest precedence, and overrule any conflicting rules that might be defined in any ignore file. There are many cases where monkeying around with ignore files seems to be the right thing to do, but in reality globs can provide a more directly path to achieving the desired result.
However, for the sake of completeness, ripgrep allows an additional ignore file to be specified
with the --ignore-file
option, which defines an ignore file that should be read after both
.gitignore
and .ignore
, meaning that it takes precedence over both of those files. This can
provide a flexible way to change the rules that are to be in effect during a search.
ripgrep offers a handful of other options for fine-tuning how ignore files are interpreted and used. While we don't describe them in detail here if they sound interesting you can find more information about them in the ripgrep documentation.