The uniq
command provides are variety of methods for removing duplicate line from its input, and
has the following call signature:
uniq {options} {input {output}}
When no input
is specified uniq
takes its input from stdin, otherwise when input
is
specified that file is read and the lines of text contained in it are used as input. Likewise, by
default uniq
sends its output to stdout, but output can also be sent to a file by specifying
its path as the output
parameter.
Let's take a quick look at uniq
's basic behavior with a few simple examples. Suppose we have the
following file, which we will use throughout this section.
Now, suppose we wanted to look at some options for eliminating the repeated lines As a first
example, we call the uniq
command and pass the path to this file as an argument:
The uniq
command successfully reduced the adjacent lines containing "red" to a single
line, but did not filter out the remaining lines containing "red". By default uniq
removes lines
that are adjacent and duplicate, which is not always what we are looking for.
Achieving Truly-Unique Output
We can achieve truly-unique lines by first sorting the input so that all duplicate lines are adjacent to each other:
Now that the duplicate lines are adjacent, we can pipe that output to the uniq
command to
remove the duplicates:
which yields the desired result.
Basic Options
Now that we have seen the basic behavior, let's take a look at a few of the basic options that are available:
Option | Long Option | Description |
---|---|---|
-c | --count | prefix lines with the number of occurrences of each line |
-d | --repeated | only print duplicated lines, once for each group |
-D | print each duplicate line | |
-i | --ignore-case | ignore case when comparing lines |
-u | --unique | only print unique lines |
--help | print the help page |
Now, let's see a few of these in action.
The -d/--repeated and -D options
The option we will look at is the -d
option, which invert's uniq
's function - rather than
filtering out duplicate lines it filters out the unique lines. Going back to our original file, if
we execute uniq
with the -d
option the output contains only "red", indicating the duplicate
lines at the top of the file:
The -D
option performs a similar function, except it prints each duplicate line, rather than just
a single line representing the entire group of duplicate lines:
The -c/--count option
Rather than filtering the input, uniq
can also report how many duplicate lines are in the input:
which can also be applied to the sorted input in order to get a complete count of duplicates in the input file:
Combining Options
uniq
's options can also be combined by calling it with multiple options specified. For example,
combining the -c
and -d
options can give us a quick summary of duplicate lines in the document
as well as how many there are. Let's first try this with the original file, and we will use the long
option format for this example:
Which, as we saw in other examples, doesn't tell us the complete count of the duplicated lines. In order to do that, we need to sort the file first then apply both options:
uniq
has a few more options that we didn't cover here, which may be useful in some cases. Take a
look at uniq
's help page for more information.