GNU Coreutils - The uniq Command

The uniq command provides are variety of methods for removing duplicate line from its input, and has the following call signature:

uniq {options} {input {output}}

When no input is specified uniq takes its input from stdin, otherwise when input is specified that file is read and the lines of text contained in it are used as input. Likewise, by default uniq sends its output to stdout, but output can also be sent to a file by specifying its path as the output parameter.

Let's take a quick look at uniq's basic behavior with a few simple examples. Suppose we have the following file, which we will use throughout this section.

ninja$:·cat·names.txt

red

green

red

blue

red

ninja$:··

Now, suppose we wanted to look at some options for eliminating the repeated lines As a first example, we call the uniq command and pass the path to this file as an argument:

ninja$:·uniq·names.txt

red

green

red

blue

red

ninja$:··

The uniq command successfully reduced the adjacent lines containing "red" to a single line, but did not filter out the remaining lines containing "red". By default uniq removes lines that are adjacent and duplicate, which is not always what we are looking for.

Achieving Truly-Unique Output

We can achieve truly-unique lines by first sorting the input so that all duplicate lines are adjacent to each other:

ninja$:·sort·names.txt

blue

green

red

ninja$:··

Now that the duplicate lines are adjacent, we can pipe that output to the uniq command to remove the duplicates:

ninja$:·uniq

blue

green

red

ninja$:··

which yields the desired result.

Basic Options

Now that we have seen the basic behavior, let's take a look at a few of the basic options that are available:

Option	Long Option	Description
-c	--count	prefix lines with the number of occurrences of each line
-d	--repeated	only print duplicated lines, once for each group
-D		print each duplicate line
-i	--ignore-case	ignore case when comparing lines
-u	--unique	only print unique lines
	--help	print the help page

Now, let's see a few of these in action.

The -d/--repeated and -D options

The option we will look at is the -d option, which invert's uniq's function - rather than filtering out duplicate lines it filters out the unique lines. Going back to our original file, if we execute uniq with the -d option the output contains only "red", indicating the duplicate lines at the top of the file:

ninja$:·uniq·-d·names.txt

red

ninja$:··

The -D option performs a similar function, except it prints each duplicate line, rather than just a single line representing the entire group of duplicate lines:

ninja$:·uniq·-D·names.txt

red

ninja$:··

The -c/--count option

Rather than filtering the input, uniq can also report how many duplicate lines are in the input:

ninja$:·uniq·-c·names.txt

2·red

1·green

1·red

1·blue

1·red

ninja$:··

which can also be applied to the sorted input in order to get a complete count of duplicates in the input file:

ninja$:·uniq·-c

1·blue

1·green

4·red

ninja$:··

Combining Options

uniq's options can also be combined by calling it with multiple options specified. For example, combining the -c and -d options can give us a quick summary of duplicate lines in the document as well as how many there are. Let's first try this with the original file, and we will use the long option format for this example:

ninja$:·uniq·--repeated·--count·names.txt

2·red

ninja$:··

Which, as we saw in other examples, doesn't tell us the complete count of the duplicated lines. In order to do that, we need to sort the file first then apply both options:

ninja$:·uniq·--repeated·--count

4·red

ninja$:··

uniq has a few more options that we didn't cover here, which may be useful in some cases. Take a look at uniq's help page for more information.