Situatie
Tar files are compressed archives. You’ll encounter them frequently while using a Linux distribution like Ubuntu or even while using the terminal on macOS.
Solutie
What Does .tar.gz and .tar.bz2 Mean?
Files that have a .tar.gz
or a .tar.bz2
extension are compressed archive files. A file with just a .tar
extension is uncompressed, but those will be very rare.
The .tar
portion of the file extension stands for tape archive, and is the reason that both of these file types are called tar files. Tar files date all the way back to 1979 when the tar
command was created to allow system administrators to archive files onto tape. Forty years later we are still using the tar
command to extract tar files on to our hard drives. Someone somewhere is probably still using tar
with tape.
The .gz
or .bz2
extension suffix indicates that the archive has been compressed, using either the gzip
or bzip2
compression algorithm. The tar
command will work happily with both types of file, so it doesn’t matter which compression method was used—and it should be available everywhere you have a Bash shell. You just need to use the appropriate tar
command line options.
It’s worth noting that everything in this article also works on the Windows Subsystem for Linux, which allows you to install the Bash shell inside of Windows 10 or Windows 11, although there are other ways to open tar.gz files on Windows as well.
Extracting Files from Tar Files
Let’s say you’ve downloaded two files of sheet music. One file is called ukulele_songs.tar.gz
, the other is called guitar_songs.tar.bz2
. These files are in the Downloads directory.
Let’s extract the ukulele songs:
tar -xvzf ukulele_songs.tar.gz
As the files are extracted, they are listed in the terminal window.
The command line options we used are:
- -x: Extract, retrieve the files from the tar file.
- -v: Verbose, list the files as they are being extracted.
- -z: Gzip, use gzip to decompress the tar file.
- -f: File, the name of the tar file we want
tar
to work with. This option must be followed by the name of the tar file.
List the files in the directory with ls
and you’ll see that a directory has been created called Ukulele Songs. The extracted files are in that directory. Where did this directory come from? It was contained in the tar
file, and was extracted along with the files.
Now let’s extract the guitar songs. To do this we’ll use almost exactly the same command as before but with one important difference. The .bz2
extension suffix tells us it has been compressed using the bzip2 command. Instead of using the-z
(gzip) option, we will use the -j
(bzip2) option.
tar -xvjf guitar_songs.tar.bz2
Once again, the files are listed to the terminal as they are extracted. To be clear, the command line options we used with tar
for the .tar.bz2
file were:
- -x: Extract, retrieve the files from of the tar file.
- -v: Verbose, list the files as they are being extracted.
- -j: Bzip2, use bzip2 to decompress the tar file.
- -f: File, name of the tar file we want tar to work with.
If we list the files in the Download directory we will see that another directory called Guitar Songs has been created.
Choosing Where to Extract the Files To
If we want to extract the files to a location other than the current directory, we can specify a target directory using the -C
(specified directory) option.
tar -xvjf guitar_songs.tar.gz -C ~/Documents/Songs/
Looking in our Documents/Songs directory we’ll see the Guitar Songs directory has been created.
Note that the target directory must already exist, tar
will not create it if it is not present. If you need to create a directory and have tar
extract the files into it all in one command, you can do that as follows:
mkdir -p ~/Documents/Songs/Downloaded && tar -xvjf guitar_songs.tar.gz -C ~/Documents/Songs/Downloaded/
The -p
(parents) option causes mkdir
to create any parent directories that are required, ensuring the target directory is created.
Looking Inside Tar Files Before Extracting Them
So far we’ve just taken a leap of faith and extracted the files sight unseen. You might like to look before you leap. You can review the contents of a tar
file before you extract it by using the -t
(list) option. It is usually convenient to pipe the output through the less
command.
tar -tf ukulele_songs.tar.gz | less
Notice that we don’t need to use the -z
option to list the files. We only need to add the -z
option when we’re extracting files from a .tar.gz
file. Likewise, we don’t need the -j
option to list the files in a tar.bz2
file.
Scrolling through the output we can see that everything in the tar file is held within a directory called Ukulele Songs, and within that directory, there are files and other directories.
We can see that the Ukulele Songs directory contains directories called Random Songs, Ramones and Possibles. To extract all the files from a directory within a tar file use the following command. Note that the path is wrapped in quotation marks because there are spaces in the path.
tar -xvzf ukulele_songs.tar.gz "Ukulele Songs/Ramones/"
To extract a single file, provide the path and the name of the file.
tar -xvzf ukulele_songs.tar.gz "Ukulele Songs/023 - My Babe.odt"
You can extract a selection of files by using wildcards, where *
represents any string of characters and ?
represents any single character. Using wildcards requires the use of the --wildcards
option.
tar -xvz --wildcards -f ukulele_songs.tar.gz "Ukulele Songs/Possibles/B*"
Extracting Files Without Extracting Directories
If you don’t want the directory structure in the tar file to be recreated on your hard drive, use the --strip-components
option. The --strip-components
option requires a numerical parameter. The number represents how many levels of directories to ignore. Files from the ignored directories are still extracted, but the directory structure is not replicated on your hard drive.
If we specify --strip-components=1
with our example tar file, the Ukulele Songs top-most directory within the tar file is not created on the hard drive. The files and directories that would have been extracted to that directory are extracted in the target directory.
tar -xvzf ukulele_songs.tar.gz --strip-components=1
There are only two levels of directory nesting within our example tar file. So if we use --strip-components=2
, all the files are extracted in the target directory, and no other directories are created.
tar -xvzf ukulele_songs.tar.gz --strip-components=2
If you look at the Linux man page you’ll see that tar
has got to be a good candidate for the title of “command having the most command line options.” Thankfully, to allow us to extract files from .tar.gz
and tar.bz2
files with a good degree of granular control, we only need to remember a handful of these options.
Leave A Comment?