Searching compressed files

If you need to search a set of log files in /var/log, some of which have been compressed with gzip as part of the logrotate procedure, it can be a pain to deflate them to check them for a specific string, particularly where you want to include the current log which isn’t compressed:

$ gzip -d log.1.gz log.2.gz log.3.gz
$ grep pattern log log.1 log.2 log.3

It turns out to be a little more elegant to use the -c switch for gzip to deflate the files in-place and write the content of the files to standard output, concatenating any uncompressed files you may also want to search in with cat:

$ gzip -dc log.*.gz | cat - log | grep pattern

This and similar operations with compressed files are common enough problems that short scripts in /bin on GNU/Linux systems exist, providing analogues to existing tools that can work with files in both a compressed and uncompressed state. In this case, the zgrep tool is of the most use to us:

$ zgrep pattern log*

Note that this search will also include the uncompressed log file and search it normally. The tools are for possibly compressed files, which makes them particularly well-suited to searching and manipulating logs in mixed compression states. It’s worth noting that most of these are actually reasonably simple shell scripts.

The complete list of tools, most of which do the same thing as their z-less equivalents, can be gleaned with a quick whatis call:

$ pwd
/bin
$ whatis z*
zcat (1)   - compress or expand files
zcmp (1)   - compare compressed files
zdiff (1)  - compare compressed files
zegrep (1) - search possibly compressed files for a regular expression
zfgrep (1) - search possibly compressed files for a regular expression
zforce (1) - force a '.gz' extension on all gzip files
zgrep (1)  - search possibly compressed files for a regular expression
zless (1)  - file perusal filter for crt viewing of compressed text
zmore (1)  - file perusal filter for crt viewing of compressed text
znew (1)   - recompress .Z files to .gz files

If you are dealing with files compressed with bzip, the analogous tools instead begin with “bz”.