The uniq Command

The uniq command provides are variety of methods for removing duplicate line from its input, and has the following call signature:

uniq {options} {input {output}}

When no input is specified uniq takes its input from stdin, otherwise when input is specified that file is read and the lines of text contained in it are used as input. Likewise, by default uniq sends its output to stdout, but output can also be sent to a file by specifying its path as the output parameter.

Let's take a quick look at uniq's basic behavior with a few simple examples. Suppose we have the following file, which we will use throughout this section.

·
·
ninja$:·cat·names.txt
red
red
green
red
blue
red
ninja$:··

Now, suppose we wanted to look at some options for eliminating the repeated lines As a first example, we call the uniq command and pass the path to this file as an argument:

·
·
·
ninja$:·uniq·names.txt
red
green
red
blue
red
ninja$:··

The uniq command successfully reduced the adjacent lines containing "red" to a single line, but did not filter out the remaining lines containing "red". By default uniq removes lines that are adjacent and duplicate, which is not always what we are looking for.

Achieving Truly-Unique Output

We can achieve truly-unique lines by first sorting the input so that all duplicate lines are adjacent to each other:

·
·
ninja$:·sort·names.txt
blue
green
red
red
red
red
ninja$:··

Now that the duplicate lines are adjacent, we can pipe that output to the uniq command to remove the duplicates:

·
·
·
·
·
ninja$:·uniq
blue
green
red
ninja$:··

which yields the desired result.

Basic Options

Now that we have seen the basic behavior, let's take a look at a few of the basic options that are available:

Option Long Option Description
-c --count prefix lines with the number of occurrences of each line
-d --repeated only print duplicated lines, once for each group
-D print each duplicate line
-i --ignore-case ignore case when comparing lines
-u --unique only print unique lines
--help print the help page

Now, let's see a few of these in action.

The -d/--repeated and -D options

The option we will look at is the -d option, which invert's uniq's function - rather than filtering out duplicate lines it filters out the unique lines. Going back to our original file, if we execute uniq with the -d option the output contains only "red", indicating the duplicate lines at the top of the file:

·
·
·
·
·
·
·
ninja$:·uniq·-d·names.txt
red
ninja$:··

The -D option performs a similar function, except it prints each duplicate line, rather than just a single line representing the entire group of duplicate lines:

·
·
·
·
·
·
ninja$:·uniq·-D·names.txt
red
red
ninja$:··

The -c/--count option

Rather than filtering the input, uniq can also report how many duplicate lines are in the input:

·
·
·
ninja$:·uniq·-c·names.txt
2·red
1·green
1·red
1·blue
1·red
ninja$:··

which can also be applied to the sorted input in order to get a complete count of duplicates in the input file:

·
·
·
·
·
ninja$:·uniq·-c
1·blue
1·green
4·red
ninja$:··

Combining Options

uniq's options can also be combined by calling it with multiple options specified. For example, combining the -c and -d options can give us a quick summary of duplicate lines in the document as well as how many there are. Let's first try this with the original file, and we will use the long option format for this example:

·
·
·
·
·
·
·
ninja$:·uniq·--repeated·--count·names.txt
2·red
ninja$:··

Which, as we saw in other examples, doesn't tell us the complete count of the duplicated lines. In order to do that, we need to sort the file first then apply both options:

·
·
·
·
·
·
·
ninja$:·uniq·--repeated·--count
4·red
ninja$:··

uniq has a few more options that we didn't cover here, which may be useful in some cases. Take a look at uniq's help page for more information.