The cut command in Linux: extracting columns of text

Introduction

In the world of system administration and data processing, having tools that allow quick text manipulation is essential. One of the most useful and simple Linux commands is cut, whose main function is to extract columns or fields from a text stream based on delimiters or character positions. Although it may seem limited at first glance, its combination with pipes and other commands makes it a key piece for scripts and log analysis.

Basic Syntax

The general form of the command is:

cut OPTION... [FILE]...

If no file is specified, cut reads from standard input, making it ideal for use in pipelines. The most important options define which part of the text to extract and how fields are separated.

Most Used Options

-f N: selects field or fields N (for example, -f 2 for the second field). Multiple fields can be specified separated by commas (-f 1,3,5) or ranges (-f 2-4).
-d DELIM: sets the delimiter that separates fields. By default, cut uses a tab, but with this option you can specify any character, such as a comma (-d ',') or a semicolon (-d ';').
-b LIST: extracts bytes according to the LIST (useful when working with binary or fixed-width data).
-c LIST: extracts characters by position, similar to -b but counting characters instead of bytes.

Practical Examples

Imagine a CSV file named datos.csv with the following content:

nombre,edad,ciudad
Juan,30,Madrid
Ana,25,Barcelona
Luis,28,Sevilla

To get only the city column, we use:

cut -d ',' -f 3 datos.csv

This returns:

ciudad
Madrid
Barcelona
Sevilla

If we want to remove the header and keep only the values, we can combine with tail:

cut -d ',' -f 3 datos.csv | tail -n +2

Another common case is to extract the first and third field:

cut -d ',' -f 1,3 datos.csv

Result:

nombre,ciudad
Juan,Madrid
Ana,Barcelona
Luis,Sevilla

When the delimiter is not a simple character, such as multiple spaces, we can use -d ' ' and try to reduce multiple spaces with tr -s ' ' before applying cut.

When working with log files where information is in fixed positions, the -c option is very useful. Suppose each line has a 19-character timestamp followed by a message; to get only the message:

cut -c 20- archivo.log

This shows from character 20 to the end of each line.

Tips and Tricks

Remember that cut does not handle delimiters that are regular expressions; if you need something more complex, combine it with awk.
Use single quotes around the delimiter to prevent the shell from interpreting special characters.
When working with ranges, -f 2- means from field 2 to the last, while -f -2 indicates from the start up to field 2.
To quickly view the structure of a file, try cut -f 1-5 -d ',' archivo.csv | head.
In scripts, store the result in a variable: COL2=$(cut -d ';' -f 2 entrada.txt).

Conclusion

The cut command is a lightweight but powerful tool for extracting text columns in Linux. Its simplicity makes it ideal for quick data processing tasks, while its ability to combine with other shell utilities makes it an indispensable component in any administration toolbox. Mastering its options and knowing when to use it will allow you to save time and write cleaner, more efficient scripts.

This post is also available in ESPAÑOL.