This article was written in 1998. Many things have changed since
then...
Introduction
Controlling a computer by typing commands to a so-called
command line interpreter is still most people's favorite way
of working, despite the capabilities of modern window systems. When
you know the names of the commands and their options, working from the
command line is much less complicated and usually faster than complex
series of mouse movements and button clicks.
The way commands and options are specified depends on how the
commands are interpreted, and who is handling the options. Sometimes
this is the command line interpreter, but quite often the program that
is run by the command has to handle the options itself.
Under most moderns command shells, including the popular Unix and
Windows shells, a command line consists of the name of the program to
be executed, followed by zero or more options and arguments. There are
two conventions on how options look like and should be interpreted:
option letters and option words.
In the case of option letters, options consist of a single dash
followed by one or more characters, usually letters, each being
interpreted individually. For example, ‘-abc ’ means the
same as ‘-a -b -c ’. When options take values it is
usually possible to bundle the values as well. For example,
‘-aw80L24x ’ means the same as ‘-a -w 80 -L 24
-x ’.
In the case of option words, options consist of a double dash
followed by a single option word. When an option takes a value, the
value follows the option word or can be appended to the option word
using an equals sign. Using this convention, the previous example
could read ‘--all --width=80 --length 24 --extend ’. With
option words, it is much easier to remember options and their
meanings.
In either case, options precede other program arguments and the
recognition of options stops as soon as a non-option argument is
encountered. A double dash on itself explicitly stops option
recognition.
Often combinations are allowed, for example, a program can accept
‘-a ’ being the same as ‘--all ’. Some
programs accept option words with just a single dash (and will not use
option letters). Sometimes options and non-option arguments may be
mixed.
You've probably written programs that handle command line options
like ‘-h ’ for height, ‘-w’ for width, ‘-v’ for verbose,
and so on. Some might be optional, some might be case-insensitive,
some might not expect an argument afterward. With Perl, parsing
options is not very hard to do, but after writing eight subroutines
for eight programs, you might wonder whether there's a better way.
There is -- in fact, there are several ways.
The simple way -- ‘perl -s ’
The Perl interpreter itself supports the single-character style of
options. The Perl script is free to interpret the command line
arguments the way it likes. Perl uses a special command line option
‘-s ’ to facilitate the option handling for scripts.
Assuming you start Perl as follows: perl -s script.pl -foo -bar
myfile.dat
Perl will remove anything that look like options
(‘-foo ’ and ‘-bar ’) from the command line
and set corresponding variables ($foo and
$bar ) to a true value. Note that the options are
words but preceded with a single dash. When a command line argument is
encountered that is not an option Perl will not look any further.
Although this method is very limited it is quite useful to get
started.
The easy way -- Getopt::Std
Perl comes standard with two modules that assist programs in
handling command line options: Getopt::Std and
Getopt::Long .
Module Getopt::Std provides two subroutines,
getopt and getopts . These routines have in
common that they use a single dash to identify option letters and they
stop processing options when the first non-option is detected.
Subroutine getopt takes one mandatory argument, a
string containing the option letters that take values. For example,
when you call
getopt ('lw');
your program will accept ‘-l24 -w 80 ’ and set the
variable $opt_l to 24 and $opt_w to 80. Note
that the value can be bundled with the option letter but it need not.
Other option letters are also accepted (and can be bundled with other
letters), for example ‘-ab ’ will set each of the
variables $opt_a and $opt_b to the value
1 . When it is not desired to have (global) variables
defined, getopt can be passed a reference to a hash as an
optional second argument. Hash keys will be x (where
x is the option letter) and the key value will be set to the
option value or 1 if the option did not take a value.
Subroutine getops allows a little bit more control
over the options. Its argument is a string containing the option
letters of all options that are recognized. If an option
takes a value, the option letter in the string is followed by a colon.
For example, using
getops ('abl:w:');
will make your program take options ‘a ’ and
‘b ’ without a value, and ‘l ’ and
‘w ’ with a value. Bundling is allowed. Other command line
arguments that start with a dash but are not one of these will cause
an error message to be printed. As with getopt , a hash
reference can be passed as an optional second argument.
The functionality provided by Getopt::Std is much
better than ‘perl -s ’, but still limited.
The advanced way -- Getopt::Long
Module Getopt::Long defines subroutine
GetOptions that takes care of advanced handling of
command line options.
GetOptions makes it possible to have ultimate control
over the handling of command line options. It provides support for:
- single-letter options, with bundling;
- option words, using a single dash, double dash or plus (using a
plus sign was an intermediate standard used by the GNU project);
- a mix of the above, in which case the long options must start with a
double dash.
Other important features include:
- options can take (mandatory or optional) values;
- option values can be strings or numbers;
- full control over where the option value will be delivered;
- full checking of options and values.
Standard operation: option words
In its standard configuration, GetOptions will handle
option words, matching them in a case-insensitive way. Options may be
abbreviated to uniqueness. Options and other command line arguments
may be mixed, in which case the all options will be processed first
and the other arguments will remain in @ARGV .
The following call to GetOptions will allow a single
option, ‘foo ’. When this option is specified on the
command line the variable $doit will be set to value
1 :
GetOptions ('foo' => \$doit);
In this call, 'foo' is the option control
string, and \$doit the option destination.
Multiple pairs of control strings and destinations may be passed.
GetOptions will return a true result if
processing was successful and a false result when errors were
detected. Besides a false result, GetOptions
will issue a descriptive error message using warn .
The option word may optionally be followed by aliases,
alternative option words that refer to the same option, for example:
GetOptions ('foo|bar' => \$doit);
If you want to specify that an option takes a value, for example a
string, append ‘=s ’ to the option control string:
GetOptions ('foo=s' => \$thevalue);
When you use a colon instead of the equals, the option takes a value
only when one is present:
GetOptions ('foo:s' => \$thevalue, 'bar' => \$doit);
Calling this program with arguments ‘-foo bar blech ’
will deliver value 'bar' in $thevalue but
when called with ‘-foo -bar blech ’ $thevalue
will be set to an empty string (and $bar will be set to
1 ).
Besides strings, options can take numeric values; you can use ‘=i’
or ‘:i ’ for integer values and ‘=f ’ or
‘:f ’ for floating point values.
Using single-letter options and bundling
To use single-letter options is trivial, but to allow them to be
bundled GetOptions needs to be configured first. Module
Getopt::Long has a subroutine Configure that
can be called with a list of strings, each describing a configuration
characteristic. For the bundling of single-letter options, you should
use:
Getopt::Long::Configure ('bundling');
Now GetOptions will happily accept single-letter options
and bundle them:
GetOptions ('a' => \$all, 'l=i' => \$length, 'w=i' => \$width);
This will allow command line arguments of the form ‘-a -l 24 -w
80 ’ but also ‘-al24w80 ’. You can mix these with
option words:
GetOptions ('a|all' => \$all, 'l|length=i' => \$length,
'w|width=i' => \$width);
However, for the option words, a double dash is required:
‘--length 24 ’ is acceptible, but ‘-length
24 ’ is not. The latter will cause the leading ‘l ’
to be interpreted as option letter ‘l ’, and then complain
that ‘ength ’ is not a valid integer value.
For maximum confusion,
Getopt::Long::Configure ('bundling_override');
will allow option words with a single dash, where the words take
precedence over bundled single-letter options. For example:
GetOptions ('a' => \$a, 'v' => \$v, 'x' => \$x, 'vax' => \$vax);
will treat ‘-axv ’ as ‘-a -x -v ’ but ‘-vax ’ as a single option word.
Advanced destinations
You do not need to specified the option destination. If no
destination is specified, GetOptions will define
variables $opt_ xxx where xxx is the
name of the option, just like getopt and
getopts . GetOptions will also accept a reference to a
hash as its first argument and deliver the option values there, again
just like getopt and getopts .
If you do specify the option destination, it does not necessarily
need to be a scalar. If you specify a reference to an array, option
values are pushed into this array:
GetOptions ('foo=i' => \@values);
Calling this program with arguments ‘-foo 1 -foo 2 -foo
3 ’ will result in @values having the value
(1,2,3) provided it was initially empty.
Also, the option destination can be a reference to a hash. In this
case, option values can have the form
‘key= value’. The value will be
stored in the hash with the given key.
Finally, the destination can be a reference to a subroutine. This
subroutine will be called when the option is handled. It gets two
arguments passed: the name of the option and the value.
A special option control string ‘<> ’ can be used in
this case to connect a subroutine to handle non-option arguments. This
subroutine will be called with the name of the non-option argument.
For example:
GetOptions ('x=i' => \$x, '<>' => \&doit);
When you execute this program with command line arguments ‘-x
1 foo -x 2 bar ’ this will call subroutine ‘doit ’
with argument 'foo' (and $x equal to
1 ), and then call ‘doit ’ with argument
'bar' (and $x equal to 2).
Other configuration characteristics
GetOptions supports several other configuration
characteristics. You can switch off the default behavior to match
option words in a case-insensitive way with:
Getopt::Long::Configure ('no_ignore_case');
To inhibit automatic abbreviations for option words, use
'no_auto_abbrev' . To stop detecting options after the
first non-option command line argument, use
'require_order' . For a complete list see the
Getopt::Long documentation.
Help texts
People often ask me why GetOptions does not provide
facilities for help messages regarding command line options. There are
two reasons why I have not implemented these.
The first reason is that although command line options have a
fairly uniform appearance, help messages have not. Whatever format of
messages would be supported it would please some and displease lots of
others. It would also clobber the calls to GetOptions ,
requiring long lists of parameters to get all the information passed
through.
The second reason is that Perl allows a program to contain its own
documentation, in so-called Plain Old Documentation (POD)
format, and modules exist that extract this information to supply help
messages. The following subroutine uses module Pod::Usage
for this purpose, it also shows how Pod::Usage can be
demand loaded:
sub options () {
my $help = 0; # handled locally
my $ident = 0; # handled locally
my $man = 0; # handled locally
# Process options.
if ( @ARGV > 0 ) {
GetOptions('verbose' => \$verbose,
'trace' => \$trace,
'help|?' => \$help,
'manual' => \$man,
'debug' => \$debug)
or pod2usage(2);
}
if ( $man or $help ) {
# Load Pod::Usage only if needed.
require "Pod/Usage.pm";
import Pod::Usage;
pod2usage(1) if $help;
pod2usage(VERBOSE => 2) if $man;
}
}
The latest version of Getopt::Long (This article
describes version 2.17) can be found on CPAN in directory
authors/Johan_Vromans . This kit also contains a script
template that uses Getopt::Long with
Pod::Usage .
Other option handling modules
A few other option handling modules can be found on CPAN. From
directory
modules/by-category/12_Option_Argument_Parameter_Processing
the following modules can be downloaded:
Getopt::Mixed (file Getopt-Mixed-1.008.tar.gz )
This module provides handling option words and option letters. It
was developed a couple of years ago, when Getopt::Std
only handled option letters and Getopt::Long only handled
option words. It is very much obsolete now.
Getopt::Regex (file Getopt-Regex-0.02.tar.gz )
An option handler that uses regular expressions to identify the
options, and closures to deliver the option values.
Getopt::EvaP (file Getopt-EvaP-2.3.1.tar.gz )
This module uses a table-driven option handler that provides most
of the features of Getopt::Long but also includes first
level help messages.
Getopt::Tabular (file Getopt-Tabular-0.2.tar.gz )
Another table-driven option handler loosely inspired by Tcl/Tk.
Powerful, but very complex to set up.
|