The uniq command in Linux: removing duplicate lines

Introduction

In the world of system administration and text processing, the uniq command is an essential tool for identifying and removing duplicate lines in sorted files. Although its operation seems simple, its power is revealed when combined with other utilities such as sort, grep, or awk. In this article we will explore in depth how uniq works, its most useful options, and practical examples you can apply immediately in your daily workflow.

What is uniq and when to use it?

The name uniq comes from “unique”. Its main purpose is to read a text input line by line and, whenever lines are adjacent and identical, show only one copy. This means that for uniq to work correctly, the input file must be previously sorted; otherwise, separated duplicates will not be detected. It is very useful in logs, configuration files, user lists, or any scenario where you need to clean up repeated data.

Basic Syntax

The simplest way to invoke uniq is:

uniq [options] file

If no file is specified, the command reads from standard input, allowing it to be easily chained with pipes (|). For example:

sort file.txt | uniq

This sorts the content and then removes adjacent repetitions.

Most Used Options

  • -c: prefixes each line with the number of times it appears.
  • -d: shows only lines that are duplicated (one copy of each set).
  • -u: shows only lines that are not repeated (unique).
  • -i: ignores case when comparing.
  • -f N: skips the first N fields when comparing (useful with delimited files).
  • -s N: skips the first N characters of each line.
  • -w N: compares only the first N characters of each line.

Practical Examples

1. Removing Simple Duplicates

Suppose we have a file names.txt with the following list (already sorted):

Ana
Ana
Luis
Luis
Luis
María

We run:

uniq names.txt

Result:

Ana
Luis
María

2. Counting Occurrences

To know how many times each name appears:

uniq -c names.txt

Output:

      2 Ana
      3 Luis
      1 María

3. Show Only Duplicates

If we are interested only in the names that are repeated:

uniq -d names.txt

Result:

Ana
Luis

4. Ignoring Case

With a file that has case variations:

ana
ANA
Luis
luis
María

First we sort without distinguishing case and then apply uniq -i:

sort -f names.txt | uniq -i

We will get a single entry for each name regardless of case.

Combining uniq with Other Commands

The true potential of uniq shows up in pipelines. Some common patterns:

  • cat file | sort | uniq -c | sort -nr: counts frequencies and sorts from highest to lowest.
  • grep "error" log.txt | sort | uniq -c: how many times each error message appears.
  • cut -d: -f1 /etc/passwd | sort | uniq -d: detects duplicate users in the password file (useful for security audits).

Tips and Tricks

  • Always sort before using uniq unless you are sure the file is already sorted.
  • Use uniq -z if you are working with null-separated entries (useful with find -print0).
  • Combine awk to create a custom uniq when you need to compare only certain fields.
  • In scripts, check the exit code: uniq returns 0 on success and 1 if there were write errors.

Conclusion

The uniq command is a small but powerful piece in the arsenal of any Linux system administrator or command-line enthusiast. Its simplicity hides great versatility, especially when combined with sorting and filtering tools. Mastering its options will allow you to debug logs, clean lists, and perform data analysis quickly and efficiently. The next time you find yourself facing a file with repeated lines, remember that uniq is ready to leave it clean and ordered.

This post is also available in ESPAÑOL.

Leave a Reply

Your email address will not be published. Required fields are marked *

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional para Francesc Roig francesc@vivaldi.net .