The shuf command in Linux: shuffle lines randomly

Introduction

In the world of the Linux command line, there are numerous utilities designed to manipulate text quickly and efficiently. Among them, shuf stands out for its ability to shuffle input lines randomly, a task that is useful in testing, generating sample data, creating random lists, and many other scenarios. This article explores the shuf command in depth, its syntax, its most common options, and some practical examples you can apply immediately in your workflow.

What exactly does shuf do?

shuf belongs to the GNU coreutils package and its main function is to read lines from a file or from standard input and reorder them randomly. Each execution produces a different permutation, provided the random number generator is not initialized with the same seed. The tool does not modify the original file; it simply writes the result to standard output, allowing you to redirect it to another file or chain it with other commands via pipes.

Basic syntax

The most direct way to use shuf is to specify the input file:

shuf nombre_del_archivo.txt

If no file is specified, shuf reads from standard input, allowing you to combine it with other commands:

cat lista.txt | shuf

The result will be the same lines but in an unpredictable order each time you invoke it.

Most used options

-n NUM or –head-count=NUM: shows only the first NUM lines after shuffling. This is equivalent to taking a random sample of size NUM.
-e or –echo: treats each command‑line argument as an input line. For example, shuf -e manzana pera naranja will produce a permutation of those three words.
-i LO-HI or –input-range=LO-HI: generates a sequence of numbers from LO to HI inclusive and then shuffles it. Very useful for creating draws or selecting random indices.
–repeat: allows the output to contain duplicate lines. Without this option, each input line appears at most once in the result.
-z or –zero-terminated: changes the line delimiter to a null character, which makes handling file names that contain spaces or newlines easier.

Practical examples

Shuffling a log file

Suppose you have a file log.txt with thousands of entries and you want to obtain a random sample of 100 lines for quick inspection:

shuf -n 100 log.txt

This command shuffles the entire file and then shows only the first hundred lines, providing a statistically valid representation of the full content.

Generating lottery numbers

To simulate a draw of six numbers between 1 and 49, you can use the input range:

shuf -i 1-49 -n 6

Each execution will produce a distinct set of six numbers, as long as the –repeat option is not activated.

Creating a random word list

If you have a dictionary in one‑word‑per‑line format called diccionario.txt and you want to obtain ten random words:

shuf -n 10 diccionario.txt

This technique is frequently used in generating memorable passwords or creating test data for applications.

Combining shuf with other commands

The power of shuf is amplified when used in pipelines. For example, to get a random list of the processes that consume the most memory:

ps -eo pid,ppid,cmd,%mem --sort=-%mem | shuf -n 5

First, all processes are listed sorted by memory usage, then the output is shuffled and the first five lines are shown, yielding a random sample among the top memory consumers.

Handling file names with spaces

When file names may contain spaces or special characters, it is advisable to use the null delimiter:

find . -type f -print0 | shuf -z -n 5 | xargs -0 ls -lh

This command finds all files, passes them to shuf via a null stream, selects five at random, and then displays their details with ls.

Alternatives and complements

Although shuf is the most direct tool for shuffling lines, there are other options that may be useful depending on the context. The command sort -R also orders randomly, but its randomness algorithm may differ and does not guarantee the same uniform distribution as shuf across all coreutils versions. Another alternative is to use awk with the rand() function to assign a random number to each line and then sort by that value. These techniques can be valid when working in environments where shuf is not available or when you need finer control over the randomness seed.

Setting a seed for reproducibility

In some scenarios, such as debugging scripts or generating reports that must be identical across runs, it is useful to fix the seed of the random number generator. shuf respects the environment variable GNU_SHUF_RANDOM_SOURCE. If you assign it to a file or a device such as /dev/urandom, you can influence the entropy source. For example, running GNU_SHUF_RANDOM_SOURCE=./semilla.bin shuf -n 10 datos.txt will make shuf read the bytes from semilla.bin to initialize its generator, producing the same permutation as long as the content of semilla.bin does not change.

Performance tips

shuf needs to read the entire input into memory to create a uniform permutation. Therefore, with very large files (several gigabytes) RAM consumption can become high. In those cases, you can split the file into smaller chunks using split, apply shuf to each fragment, and then combine the results with cat or an additional shuffle. Another strategy is to use streaming tools such as perl -MList::Util=shuffle -e 'print shuffle <>' that can work more lightly in certain scenarios, although they sacrifice the strict uniformity guarantee that shuf provides.

Best practices when using shuf in scripts

When integrating shuf into shell scripts, consider the following points:

Always check the exit status of shuf to detect errors, especially when reading from files that might be missing or inaccessible.
Use absolute paths or reliable relative paths to avoid surprises when the script is executed from different directories.
If you need reproducible results for testing, export a fixed value for GNU_SHUF_RANDOM_SOURCE at the beginning of the script.
Be mindful of temporary files; if you create intermediate files, ensure they are cleaned up in a trap or at the end of the script.
When dealing with user‑supplied input, validate that the data does not contain unexpected null bytes if you are not using the -z option.

Conclusion

The shuf command is a simple yet powerful tool for any Linux user who needs to randomize text lines. Its clear syntax, combined with useful options such as -n, -e, -i, and –repeat, makes it an indispensable ally for testing, data generation, draws, and task automation. By understanding how it works and its limitations, you can integrate it effectively into scripts and daily workflows, saving time and ensuring unpredictable results when you need them.

This post is also available in ESPAÑOL.