To submit a job, the first
thing you need is to create a command file for that job. This is a text
file containing infomation about the job, such as the name of the program
to run, its arguments, input/output files, and some options about how
condor should behave with respect to this job.
There are many possible
options and tags for this file; you can find plenty of description on the
condor website (http://cs.wisc.edu/condor)
, in the manual.
Anyhow, a typical template would be this:
###############################
# Condor command file example
universe = standard
executable = myprogram
arguments = arg1 arg2
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
output = outfile.dat
error = errfile.dat
log = logfile.dat
Initialdir = /home/control/condorfiles/
queue
Some description: the
universe parameter is normally standard if you are running a program
which has been relinked using condor_compile. This allows for advanced
features like checkpoints, remote system calls, etc. If you could not
relink the program, you should use the value "vanilla" for universe
instead. If the program is a Java program, you should use "java".
The executable
parameter is the name of the program. If it's a java program, include also
the .class extension.
Arguments must hold
the arguments you would normally pass to your program on the command line.
You can omit this parameter if you don't need arguments. For a java
program, you must also put the program name as the first parameter.
Should_transfer_files
tells condor whether or not the files specified further down should be
transferred to the executing machine before execution.
Output specifies the
name of the file where you will find everything your program writes to the
standard output. Error is the file corresponding to the standard
error.
Log is a file in
which condor will write info about how the job is doing.
The location of the three
above files is relative to the program's directory, unless you specify the
Initialdir parameter, in which case the files are found relative to
the directory hereby specified.
The final queue
clause tells condor that the job definition is over, and that it can be
queued for execution in the pool.
To send the job to the
pool, after you create the command file (and save it to, e.g.,
mycmdfile.condor), you should run the following command:
condor_submit
mycmdfile.condor
To see how the machines in
the pool are doing, run the condor_status command. You will see all
the machines (more than one entry for those with multi-cpus), and their
status indicated. The status "Owner" means the machine is not available
for condor (someone is using the console, or is remotely logged in). The
status "Unclaimed" means that the machine is not being used, and can take
up a job. The status "Claimed" means that the machine is currently matched
to a job, and it's running it.
Note that the information about machines which you can see through
condor_status is not istantaneously updated; for example, some minutes may
elapse before you can see a machine as "Claimed" after it actuallky
started a job. So the condor_status output is useful, but must not be
taken as the absolute truth.