Command line compression for Windows
Introduction
Unix-style CLI compression
WinRAR goes command-line
The opensource connection: 7-Zip
CLI tools exit codes
Short Dictionary
*** INTRODUCTION ***
Back in the day when archiving meant putting a bunch of files together into a single bigger chunk in order to make the transfer to tape faster, tar was a common tool. It still is, today, in the UNIX world, typically used in conjunction with two standard compression utilities: gzip and bzip2. The resulting compressed file is sometimes called a “tarball”, although there’s no actual “tar” involved, just zeros and ones.
In the Windows world, and this applies to all modern operating systems (Mac OS X, Linux with a desktop environment such as KDE or Gnome), compression tasks are generally accomplished from within a GUI based program. Such software has been covered in previous articles about WinRAR, 7-Zip and WinZip. However, if you feel left out or totally uncool for not being able to use command line tools for archiving your stuff, read on and learn how to use strange sounding programs such as tar, gzip and bzip2, without installing a UNIX derivate (Linux, various BSD flavors and so on). Renowned compression software will not be avoided; WinRAR and 7-Zip both have command line interfaces and you’ll get a crash course on those too.
*** Tarring, gzipping and bzipping stuff like an uber-geek ***
Tar is a kind of a grandfather, he’s been around for ages and it’s seen a lot. It’s a bit tired, but wise and can do the job if you know how to “talk” to it. Even if you can’t teach it new tricks, the ones it already knows, albeit somewhat old, make it very flexible and fit for many situations. When coupled with auxiliary programs, e.g. gzip/bzip2 for compression, it becomes even more powerful.
You’ve probably seen some funny looking files with double extensions, like NAME.tar.gz or NAME.tar.bz2 and thought “what’s up with *that* ?”. Get ready, we’re about to shed some light on the subject (sunglasses optional)…
Two extensions means two programs were involved in the creation of that program. Take .tar.gz for example, the original data was first “tarred” to NAME.tar and then “gzipped” to NAME.tar.gz. Tar’s job is to look at all the files and directories it has to process and concatenate their contents, along with other information such as path, permissions and so on, into a single file with a .tar extension. Then, the .tar file was sent (“piped” in geek language) to gzip for compression, to obtain the .tar.gz thing. The “tar” program is smart though, and enables you to do this in a single step with a command similar to “tar czf *.doc docs.tar.gz” (to compress all .doc files into a single docs.tar.gz archive). Know this though: sometimes the two extensions are condensed into shorter forms such as “.tgz” and “.tbz2”.
Now, let’s see how we can do this on Windows, assuming you are running on Windows 2000 or later (XP, 2003). First of all, we need to access the command prompt (hopefully, you’ve at least *heard* of it before plunging into this article). To do that, go to Start->Run, type “cmd”, press Enter, then stand back and be amazed: the Windows command line (no, not good for h4x0ring). Next, we need the actual programs to work with, but how? Tar, gzip and bzip2 are UNIX programs, right? How could they run on Windows? Well, here’s one of the situations where opensource pays. Having access to the source of a software package makes it possible for that program to be “ported” to other platforms (i.e. modify its source so that it compiles on other operating systems and/or compilers).
All three utilities I’ve been babbling about until now have Windows ports. Gzip for Windows can be obtained from its official homepage and the same applies to bzip2. There are two tar ports actually, one is the traditional (GNU) Tar and the other is for BsdTar. The latter is said to be faster and can compress/decompress files without the use of the external gzip/bzip2 executables (“binaries” in geek-talk).
Once you obtained and installed the programs, as described above, you can start archiving right away.
To make a single .tar file with all the .docs in the current dir, use the command: “tar –cv *.doc –f AllMyDocs.tar”. The two “flags” we used stand for “create archive (c)” and “be verbose (v)” which basically means “don’t be selfish, tell me what you’re adding to the archive”.
To also compress (with gzip) the resulting .tar archive, you would normally add the “z” flag like so: “tar –czv *.doc –f AllMyDocs.tar”. However, this does not work - it fails with the error “tar: Cannot fork: Function not implemented tar: Error is not recoverable: exiting now”. Apparently, the win32 port of GNU Tar is not complete; it cannot “fork” child processes, which in English means that it cannot start a new copy of itself. Fear not, even if there is no fork in GNU Tar/win32’s world, there is a spoon and here’s where BsdTar enters the stage; just use something like “bsdtar –cvzf allmydocs.tar.gz *.doc” and you’re all set.
The above also applies to bzip2 compression, the flag to be used is “j” and the command becomes “bsdtar –cvjf allmydocs.tar.bz2 *.doc”.
If you’re a real geekboy, you can still use GNU Tar to gzip/bzip2, just “pipe” to these programs the data stream tar yielded by “tar” instead of writing it to a file, for example: “tar –cv *.doc gzip > allmydocs.tar.gz”.
Take a peek into the tarballs
Due to the “fork” issue, listing the contents of the archives we previously obtained is easier with “bsdtar”, but can be obtained with “tar” as well, by using some “pipe magic”.
To list the docs in your allmydocs.tar.gz archive using bsdtar, type: “bsdtar –tf allmydocs.tar.gz”. The same command works with bzipped files too: “bsdtar –tf allmydocs.tar.bz2”.
Using tar is a bit more complicated, but it can’t be too hard for such a good student such as yourself (you must be one, if you’re still with me): “gzip –dc allmydocs.tar.gz tar –t” or “bzip2 –dc allmydocs.tar.bz2 tar –t”. If you’re bothered by the warnings gzip/bzip2 display when decompressing data to standard output, just add another flag (-q suppress noncritical error messages) and you’ll be fine: “gzip –dcq allmydocs.tar.gz tar –t” or “bzip2 –dcq allmydocs.tar.bz2 tar –t”.
Finding the handy “-q” flag was accomplished by invoking the gzip/bzip2 help with “gzip –h” and “bzip2 –h”. Try it and notice the various options these programs provide to the user.
I want my files back! How do I get them out?
The “tar” flag for extraction is “-x” (big surprise, I know). So, a command for extracting all your aforementioned docs would be “bsdtar –xzvf allmydocs.tar.gz”. Since “bsdtar” is such a smart-ass, you can omit the flag which tells it what compression program was used to make the file (“-z” or “-j”), that’s why the command “bsdtar –xvf allmydocs.tar.gz” works just as well.
Again, using GNU Tar is a bit trickier – it involves “piping”: “gzip –dc allmydocs.tar.gz tar –xv” or “bzip2 –dc allmydocs.tar.bz2 tar –xv”.
Both commands will extract the contents of the archive, including paths information (directories).
Ok, that’s just peachy, but what about more advanced stuff?
It’s time for RAR and 7-Zip to show us what they’ve “got”. Beside basic archive operations (create, extract, list contents), these two are capable of lots of advanced stuff, for example (some features may not be available in both programs): add recovery record, repair archive, convert archive to SFX, encrypt files, freshen archive (update modified files), send compressed file to email, set dictionary size for compression, set the number of execution threads (useful for multiprocessor, dual core or HyperThreading machines), set password and so on.
*** RAR. Win RAR ***
WinRAR’s superspy CLI (Command Line Interface) version is named, simply, rar (actually rar.exe, but we’re going to call it just “rar” from now on). Best way to use it easily is to add “C:\Program Files\WinRAR” to your PATH environment variable, like you learned when we talked about “tar” and its friends. Once you’ve done that, type the command “rar” and press Enter in order to see a rather lengthy list of options and switches (flags) - we’ll be reviewing here the most important ones.
Unlike tar & co., rar’s main commands (create archive, extract and so on) are not treated like switches (flags) but are used “as-is” – a single letter immediately after the command invocation (e.g. “rar a ..” to create a new archive).
Basic commands in RAR
a – Add files to archive
Used to create a new archive or add files to an existing one
Example: “rar a allmydocs.rar *.doc” creates a RAR archive that contains all the doc files in the current directory.
c – Add archive comment
Used to add a text comment to an existing archive
Example: “rar c allmydocs.rar” allows you to interactively enter some text, which will be added to the archive as a comment.
An interesting variant of this option is cw, which dumps the comment to a file of your choice. “rar cw allmydocs.rar doc-comments.txt” would read the comment inside your archive and write it to the txt file.
e – Extract files to current directory
Used to quickly extract files from within an existing archive. Optionally, it can get only certain files (specified on the command line)
Extract all files example: “rar e allmydocs.rar” would uncompress all the arhive contents in the current directory
Extract certain file example: “rar e allmydocs.rar must-read.doc” would get only the must-read.doc out of the archive and put it in the working directory.
f – Freshen files in archive
Update archive by refreshing files that were modified since they were added to the archive or last “freshened”.
Example: “rar f allmydocs.doc” would check all corresponding files against their copy inside the archive, and update the archive where differences are found.
l – List archive
Used to display a list of files and other details contained by the archive. There are 3 modes for this command, each with its distinct set of info columns.
Default mode (l)
Shows file name, real size (in bytes), packed size, compression ratio, date and time, attributes, the CRC (Cyclic Redundancy Check) hash, the compression method and version.
Example: “rar l allmydocs.rar”
Technical mode (lt)
Additionally displays the Host OS (operating system used to create the archive, whether or not the archive is solid, and if the format is “old” or not.
Example: “rar lt allmydocs.rar”
Bare mode (lb)
Displays only the file names
Example: “rar lb allmydocs.rar”
Slightly more advanced stuff
d – Delete files or folders from archive
Used to delete one or more files inside an existing archive
Example: “rar d allmydocs.rar mustdelete.doc” would delete a single doc file from the archive
Another example: “rar d allmydocs.rar old_docs” would delete the entire “old_docs” folder (that is, including the contents)
t – Test archive files
Used to test that an archive is not damaged. Works by performing a fake extraction of the archive contents (extracted data is not actually written to disk).
Example: “rar t allmydocs.rar”
x – Extract files with full path
Used to obtain some files out of an archive, or to uncompress an entire archive. The files are extracted with “full paths”, which means the directory structure is recreated to match the one at the time of compression.
Simple mode:
Extracts all files in a RAR arhive
Example: “rar x allmydocs.rar”
Advanced mode:
Lets you skip extraction of files you don’t need, either by typing their names on the command line or by providing a exclusion list file.
Example (1): “rar x –xuseless.doc allmydocs.rar” (skips useless.doc while uncompressing the archive)
Example (2): “rar x –x@ allmydocs.rar” (then type filenames you want to skip)
Example (3): “rar mailto:x%20-x@skip.txt allmydocs.rar” (all files specified in skip.txt will be skipped)
Even more advanced stuff (good to impress on the first date)
s – Convert to SFX / remove SFX module
The default form of this command adds a SFX module to the archive of your choice, transforming it into a self-extracting ”.exe” archive, perfect if you want to distribute the archive but suspect that some of the recipients do not have a compression program capable of handling RAR files. The default.sfx module is used or, optionally, one specified in the command line immediately after the s command.
Example: “rar s allmydocs.rar”.
The “evil twin” of this command is s-, which removes SFX modules.
It will not overwrite the self-extracting (.exe) archive, but will create a new one with the usual “.rar” extension.
Example: “rar s- allmydocs.exe” creates a new archive named “allmydocs.rar”
r – Repair archive
Attempts to repair a damaged archive using any available recovery records. If recovery records are not available, rar will only reconstruct the archive (i.e. the file list), in order to be able to recover undamaged files. If recovery records are in fact available, rar can be much more efficient and rebuild the entire archive.
The repaired archive is saved to rebuilt.original_name.rar (no recovery record) or fixed.original_name.rar (recovery record(s) found).
Example: “rar r allmydocs.rar”
The rc variant of this command is designed to work with multi-volume archives and can reconstruct missing volumes.
The rr variant can be used to add a recovery record to an existing archive; for example “rar rr allmydocs.rar”.
The rv variant does the same thing, but for multi-volume archives; the recovery records are external files with the “.rev” extension.
u – Update files in archive
Adds files that are not in the archive (using the initial file spec used when creating the archive) and freshens the ones that already are included (see the f command). If you created an archive with a command such as “rar a allmydocs.rar *.doc”, updating it will look for new .doc files and add them to allmydocs.rar, plus it will check what files inside the archive have been modified in the “real world” – the filesystem.
Example: “rar u allmydocs.rar”
*** Bored of RARing things up? Then try some zipping. 7-zipping, to be more precise ***
7-Zip’s CLI interface is called 7z.exe, but after you get to know each other (i.e. after you add 7-Zip’s directory to your PATH environment variable), you may simply call it “7z”. Try that now, to get the commands and switches list, as in the nearby picture.
Simple operations (prerequisites for achieving coolness state)
a – Add files to archive
Used to create a new archive or add files to an existing one. Can be combined with the “-r” switch which affects directory recursion.
Example (1): “7z a testarc.7z *.*” creates a new archive containing all files in the current directory. Subdirectories are ignored.
Example (2): “7z a –r testarc.7z *.*” creates a new archive containing all files in the current directory, including the subdirectories.
For both examples, if the archive already exists, 7-Zip will check if new files should be added and update it accordingly
e - Extract files from archive (without using directory names)
Used to quickly uncompress an archive’s contents into the current directory or the one specified with “-o”. Caution: it does not use the stored path information (all files will be extracted in the current/output directory). Can also be combined with the “-r” and “-x” switches.
Simple mode:
Extract all files and place them in the current directory
Example: “7z e testarc.7z”
Advanced mode:
Lets you change the destination directory or exclude some files
Example (1): “7z e –ooutput_dir testarc.7z” extracts the archive’s contents in the “output_dir” directory (can be a fully qualified path including a drive letter)
Example (2): “7z e –r –x!*.txt” extracts all files inside the archive, except .txt files
l - Lists contents of archive
Used to list the contents of an archive (filenames, sizes, dates, attributes).
Example: “7z l testarc.7z”
u – Update files in archive
Used to add new files or update files inside an archive to match the changes in the original files. Similar to RAR’s freshen (f) command. Can be combined with the “-r” switch.
Example (1): “7z u testarc.7z *.*” will update all files in the archive with respect with the files in the current directory. New files missing in the archive will be added.
Example (2): “7z u –r testarc.7z *.doc” will work only with .doc files in the current directory and any subdirectories
A bit more advanced stuff
t – Test archive
Used to test the integrity of an archive or check files inside an archive. Can be combined with the “-r” and “-x” switches
Example (1): “7z t testarchive.7z” will test the archive, including all the files inside it
Example (2): “7z t –r testarchive.7z *.txt” will test all .txt files inside the archive
d – Delete files from archive
Used to delete some files from an archive. Can be combined with the “-r” and “-x” switches for more power
Example (1): “7z d testarchive.7z *.doc” – deletes all .doc files from the archive
Example (2): “7z d –r –x!important*.doc” – deletes all .doc files from the archive, except documents which have the word “important” in their filename
x - Extract with full paths
Used to extract files from an archive, using the stored full paths, to the current directory or the directory specified with “-o”. Can also be combined with the “-r” and “-x” switches.
Simple mode:
Extracts all files and places them in the current directory
Example (1): “7z x testarc.7z”
Advanced mode:
Lets you change the output directory or skip some files
Example (1): “7z x –r –ouncompressed testarc.7z” – extracts all files to the “uncompressed” directory
Example (2): “7z x –r –x!*.doc” – extracts all files except .doc files
Create SFX archives with the “–sfx” switch
Example (1): “7z a –sfx testarc.7z *.*” – creates a SFX archive containing all files in the current directory
Example (2): “7z a –sfx7z.sfx testarc.7z” – creates a SFX archive which displays a confirmation/options dialog when executed
Well, that’s about it regarding the usage guides. There are some even more advanced commands, options and switches, but you can find them for yourself in the programs’ documentation, after you get the hang of it by trying the stuff presented in this article.
The remainder of the guide includes a short piece about CLI tools exit codes and a term dictionary. Good luck!
*** EXIT CODES ***
Most CLI utilities are kind enough to report if the operation asked of them was successful or not. They do this after their execution ends, by using the so-called “exit codes” (or “return codes”). These are numbers and have special meanings, sometimes different from program to program. The exit codes are usually invisible to the normal user, but they are important when the CLI utilities are invoked in shell scripts in order to determine whether the script performs normally, or it encountered errors.
Below is the list of return codes for WinRAR and 7-Zip, a useful resource for programmers and advanced users.
WinRAR (RAR)
0 - Successful operation.
1 - Warning. Non fatal error(s) occurred.
2 - A fatal error occurred.
3 - CRC error occurred when unpacking.
4 - Attempt to modify a locked archive.
5 - Write error.
6 - File open error.
7 - Wrong command line option.
8 - Not enough memory.
9 - File create error.
255 - User break.
7-Zip
0 - No error
1 - Warning (Non fatal error(s)). For example, some files were locked by other application during compressing. So they were not compressed.
2 - Fatal error
7 - Command line error
8 - Not enough memory for operation
255 - User stopped the process
*** DICTIONARY ***
* Tar - traditional file archiver, originally used with tape devices. Does not compress, it only concatenates files in to a single “blob”
* Gzip – open source compression program (GNU)
* Bzip2 – another open source compression program, provides better compression rates at the expense of longer compression times
* Tarball – a package obtained by archiving a group of files with ‘tar” and then passing the result to “gzip” or “bzip2”
* Pipe – the “” character, used to combine several programs in a single command line call
* Switch – a way of modifying a program’s/command’s default functionality, usually specified on the command line using one or two dash characters (e.g.: -h or –help)
Labels: 7-zip, compression, winrar







0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home