User Tutorial User Tutorial

Installation & preparation

If you have an account on an existing UI machine running the latest EMI software then you can try out this tutorial there. If not, you can install the client tools on your own UI as follows:
Configure EMI repository (see http://emisoft.web.cern.ch/emisoft/index.html).
 
Install the EMI client package using the command:
 
$ yum install emi-ui
 
Get credentials: you must have a grid certificate from your national Certification Authority or TERENA TCS in order to use this tutorial. Your credentials should be in jks or pkcs12 format for Unicore, and in pem format for ARC, dCache and gLite. You can convert between these formats as follows:
 
Create private key pem file from pkcs12 file:
 
$ openssl pkcs12 -nocerts -in cert.p12 -out userkey.pem
Enter Import Password: (insert your certificate password)
MAC verified OK
Enter PEM pass phrase: (insert your Enter PEM pass phrase - this will be your pass phrase to access your key in future)
Verifying - Enter PEM pass phrase: (reinsert your Enter PEM pass phrase)
 
Create user certificate pem file from pkcs12 file:
 
$ openssl pkcs12 -clcerts -nokeys -in cert.p12 -out usercert.pem
Enter Import Password: (insert your certificate password)
MAC verified OK
 
Create pkcs12 file from private key and user certificate pem files, and the CA certificate file (you can usually download this from your Certificate Authorities webpage):
 
$ openssl pkcs12 -export -out cert.p12 -inkey userkey.pem -in usercert.pem -certfile CAcert.crt
Enter pass phrase for userkey.pem: (insert your PEM pass phrase)
Enter Export Password: (insert your certificate password)
Verifying - Enter Export Password: (reinsert your certificate password)
 
For more information on installing the EMI UI or other EMI components please see the document Generic Installation & Configuration for EMI 1.
 

Authentication

UNICORE

Preparing preferences file

The example of the preferences file can be found in the <UCC_HOME>/conf directory (<UCC_HOME> is a directory where the UNICORE Commandline Client was unpacked). In the file there should be specified: the keystore (in jks or pkcs format), the registry URL and the password (optional). A keystore is a file which contains user's certificate and certificates of the trusted CA. Trusted certificates may be also indicated as separate jks files.
 
storetype=<jks or pkcs12>
keystore=<user keystore>
password=<user password>
registry=<user registry URL>
#optional: configure separate truststore (must be JKS)
truststore=<user truststore>
truststorePassword=<user password>
 
By default UCC checks the existence of the file in the <user_home>/.ucc/ directory, so the preferences file should be copied to such folder.
 

Connecting to the Grid

To connect the Grid user should run the connect command. To get help ucc command with -h option should be used:
 
$ ucc connect
$ ucc -h
 
The user's password can be written in the preferences file. Otherwise user will be asked for it on every call. To avoid that, an interactive mode can be run by writing ucc shell. In that mode user do not have to write ucc before commands. UCC Shell offers also the commands expansion.
 

ARC

You need a certificate and a key file in order to use the Grid. The default path for them is ~/.globus/usercert.pem and ~/.globus/userkey.pem. (It is required to set the rights of the key file to 400 to exclusively allow the owner to read the file.) If you want to put them to a different place, you have to specify the path in the ~/.arc/client.conf file, like this:
 
keypath=/home/user/.cert/userkey.pem
certificatepath=/home/user/.cert/usercert.pem
 
Your key file is protected with a passphrase. It is inconvenient to type this passphrase every time you issue a command; moreover, other Grid services acting on your behalf do not know the password. In order to work on this Grid, you have to create a public proxy certificate, which has limited life time and is not protected by password. To create the proxy, you should use the arcproxy command, like this:
 
$ arcproxy
Your identity: /DC=***/O=***/CN=***
Enter pass phrase for /home/user/.cert/userkey.pem:
...++++++
.++++++
Proxy generation succeeded
Your proxy is valid until: 2011-05-24 03:08:35
 
The validity time of the proxy certificate is 12 hours by default, which can be overwritten for example with 1 hour by using the arcproxy --constraint=validityPeriod=1H orarcproxy -c validityPeriod=1H in a shorter form.
 
The arcproxy --info command can show the current proxy's validity and location.
 
If you belong to a Virtual Organisation (VO), arcproxy can also create a proxy with the special VOMS extension which certifies that you are indeed a member of this VO and can access its resources:
 
$ arcproxy --voms atlas
Your identity:  /DC=***/O=***/CN=***
Enter pass phrase for /home/user/.cert/userkey.pem:
.......................................................................++++++
.................++++++
Contacting VOMS server (named atlas): voms.cern.ch on port: 15001
Proxy generation succeeded
Your proxy is valid until: 2011-05-24 03:08:35
 
In order to use this feature, enter your VO contact string in the file ~/.voms/vomses (ask your VO managers for details).
(Please refer to the ARC UI Manual for further options of arcproxy!)
 

gLite

The gLite UI tools use the same certificate and key file as ARC. By default these are stored in a .globus directory in your home directory.
 
keypath=/home/user/.globus/userkey.pem
certificatepath=/home/user/.globus/usercert.pem
 
Create your voms proxy
 
$ voms-proxy-init --voms
 
and check it's valid
 
$ voms-proxy-info -all
subject   : /C=***/O=***/OU=Personal Certificate/L=***/CN=***/CN=proxy
issuer    : /C=***/O=***/OU=Personal Certificate/L=***/CN=***
identity  : /C=***/O=***/OU=Personal Certificate/L=***/CN=***
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u539
timeleft  : 11:59:51
key usage : Digital Signature, Key Encipherment, Data Encipherment
=== VO testers.eu-emi.eu extension information ===
VO        : testers.eu-emi.eu
subject   : /C=***/O=***/OU=Personal Certificate/L=***/CN=***
issuer    : /C=***/O=***/OU=***/L=***/CN=***
attribute : /***/Role=NULL/Capability=NULL
timeleft  : 11:59:51
uri       : ***
 
As gLite and ARC use the same proxy file you can usually use a proxy created with arcproxy to run gLite jobs and vice-versa.
 

Browsing resources

UNICORE

To check the name of available Target Systems and storages and to list applications user can use list-sites, list-storages and list-applications commands:
 
$ ucc list-sites
$ ucc list-storages
$ ucc list-applications
 

ARC

The arcinfo [cluster ...] command prints the information about the available resources. With the --long argument it will print more information. You can specifiy the URLs of the clusters as arguments of the command. You must specify default services (that will define entry point to the Grid) or aliases (that can be used instead of the long form of individual site URLs) in you~/.arc/client.conf file like this:
 
[common]
defaultservices=index:ARC0:ldap://index1.nordugrid.org:2135/Mds-Vo-name=NorduGrid,o=grid
index:ARC0:ldap://index2.nordugrid.org:2135/Mds-Vo-name=NorduGrid,o=grid
index:ARC0:ldap://index3.nordugrid.org:2135/Mds-Vo-name=NorduGrid,o=grid
index:ARC0:ldap://index4.nordugrid.org:2135/Mds-Vo-name=NorduGrid,o=grid
computing:ARC0:ldap://ce1.grid.upjs.sk:2135/Mds-Vo-name=local,o=Grid
computing:ARC1:https://pgs03.grid.upjs.sk:50000/arex
...
[alias]
arc0=computing:ARC0:ldap://grid.tsl.uu.se:2135/nordugrid-cluster-name=grid.tsl.uu.se,Mds-Vo-name=local,o=grid
 
(Please refer to the ARC UI Manual for the format of the configuration file!)
 

gLite

Through the command lcg-infosites we can gather the available resources for our VO. We see first which Computing Elements are available
 
$ lcg-infosites --vo <VO>
#   CPU    Free Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------------
     12      12          0            0       0 cert-09.cnaf.infn.it:8443/cream-lsf-demo
      0       0          0            0       0 cream-37.pd.infn.it:8443/cream-lsf-cert
      0       0          0            0       0 cream-37.pd.infn.it:8443/cream-lsf-creamtest1
      0       0          2            0       2 cream-37.pd.infn.it:8443/cream-lsf-creamtest2
      8       8          0            0       0 lxbra2308.cern.ch:8443/cream-pbs-testersemi
 
Now we query the information system to know which Storage Elements are available
 
$ lcg-infosites --vo <VO> se
 Avail Space(kB)  Used Space(kB)  Type  SE
------------------------------------------
         7908181         1010947  SRM   cork.desy.de
       101168616         6153137  SRM   lxbra1910.cern.ch
        99630252         7691501  SRM   lxbra2502.cern.ch
        10511159          215773  SRM   lxbra2506v1.cern.ch
 

Submission of jobs

UNICORE

Job description

UNICORE Commandline Client uses job description JSON format which allows users to specify an application or executable they want to run, arguments, environment settings and files to transfer. The example job descrirption (presented below) can be copied to the date.u file.
 
# simple job: run Date
{
   ApplicationName: Date,
   ApplicationVersion: 1.0,
}
 

Running job

To run the job one should use ucc run command
 
$ ucc run date.u -v
 
In this case the standard out went for example to 1bc1bb08-7737-4fb1-854e-5d89ba18d7f0.stdout. The option -v turns on verbose mode. There can be also used -b option which gives short output file names (without the hash of the job).
 
The option -a run the job in an asynchronous way: the input files are staged-in and the job is submitted but the results can be downloaded later using the command get-output. To get the status of the specific job command get-status may be used. As an argument one can either use the job file that he got from run -a command or the End Point Reference (EPR) obtained from list-jobs:
 
$ ucc run -a date.u -v -b
$ ucc list-jobs
$ ucc get-status job
$ ucc get-output job
 

Running job on a set of files

To run UNICORE on a set of files user can put jobs descriptions in one directory (e.g. indir/) and use batch command: batch. -i argument indicate source directory (with .u scripts), -o - directory for output files:
 
$ ucc batch -i indir -o outdir
 

ARC

The arcsub commands provides features to communicate with the information systems, do brokering, translate the job descriptions, move input files and submit jobs to the clusters. When your defaultservices are properly configured, arcsub will automatically select a best Grid site for you. If for some strange reason you don't want to use Grid, but prefer to submit jobs to a specific site, you can use -c arc0 argument can be use the specify this site (here arc0 is the alias described in the configuration example, but can be a site's IP address or hostname).
 
Native ARC job description is written in XRSL format (JSDL is also possible, as well as gLite JDL). Here is a simple XRSL job description:
 
$ cat myjob.xrsl
&(executable="/bin/echo")
(arguments="Hello World")
(stdout="hello.txt")
 
Submit this job to the Grid simply as
 
$ arcsub myjob.xrsl
 
The command will print a long URL, which is the ID of the job - this must be used later to query the job's information, get the results, kill the job, renew its proxy if it expires wile job was running, and do other operations.
 
The arcstat command shows the status of the job (You should replace JOBID with the ID of your job.)
 
$ arcstat JOBID
 
The arccat command prints the standard output or error of the job:
 
$ arccat JOBID
 
The arcget command downloads the results of a finished job and remove the job from the grid:
 
$ arcget JOBID
 
The --all argument would do the following commands with all the active jobs:
 
$ arcstat --all
$ arccat --all
$ arcget --all
 
You can store selected job IDs in a file and use it as input as well:
 
$ arcstat -i myjobs.txt
 
A very useful command is arcsync : when you move to a different computer, you can syncronise the list of your jobs on the Grid by simply typing:
 
$ arcsync
 

gLite

Job submission request are expressed via JDL (Job Description Language). Find below a very simple but usable example, which just runs "uname -a" on the executing node
 
$ cat uname.jdl
Type = "Job";
JobType = "normal";
Executable = "/bin/uname";
StdOutput = "uname.out";
StdError = "uname.err";
OutputSandbox = {"uname.out","uname.err"};
Arguments = "-a";
requirements = other.GlueCEStateStatus == "Production";
rank = -other.GlueCEStateEstimatedResponseTime;
RetryCount = 0;
 
We now submit the job to the Workload Management System (WMS) which will find a suitable resource on which our job can run:
 
$ glite-wms-job-submit -a uname.jdl
 
Connecting to the service https://lxbra2303.cern.ch:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://lxbra2303.cern.ch:9000/F0KY_m0DBH5wpXzLt59q5A
==========================================================================
 
On success, the submission command returns a job identifier, that we eventually use to monitor job status and, once it's done, we use the job identifier to retrieve the output
 
$ glite-wms-job-status https://lxbra2303.cern.ch:9000/F0KY_m0DBH5wpXzLt59q5A
======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:
Status info for the Job : https://lxbra2303.cern.ch:9000/F0KY_m0DBH5wpXzLt59q5A
Current Status:     Done (Success)
Logged Reason(s):
    - job completed
    - Job Terminated Successfully
Exit code:          0
Status Reason:      Job Terminated Successfully
Destination:        lxbra2308.cern.ch:8443/cream-pbs-testersemi
Submitted:          Sat Jul  9 13:32:09 2011 CEST
==========================================================================
$ glite-wms-job-output https://lxbra2303.cern.ch:9000/F0KY_m0DBH5wpXzLt59q5A
Connecting to the service https://lxbra2303.cern.ch:7443/glite_wms_wmproxy_server
================================================================================
         JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://lxbra2303.cern.ch:9000/F0KY_m0DBH5wpXzLt59q5A
have been successfully retrieved and stored in the directory:
/tmp/jobOutput/budapest40_F0KY_m0DBH5wpXzLt59q5A
================================================================================
$
$ ls /tmp/jobOutput/budapest40_F0KY_m0DBH5wpXzLt59q5A/
uname.err  uname.out
$ ls -l /tmp/jobOutput/budapest40_F0KY_m0DBH5wpXzLt59q5A/
total 4
-rw-r--r-- 1 budapest40 users   0 Jul  9 13:35 uname.err
-rw-r--r-- 1 budapest40 users 116 Jul  9 13:35 uname.out
$
$ cat /tmp/jobOutput/budapest40_F0KY_m0DBH5wpXzLt59q5A/uname.out
Linux lxbra2506v6.cern.ch 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 13:35:45 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
$
 

Submitting Multiple Jobs

There are several ways to submit multiple jobs to a gLite Grid. The simplest is to submit separate jobs as a collection of jobs.
 
$ glite-wms-job-submit -a --collection jdls/
Connecting to the service https://lxbra2303.cern.ch:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://lxbra2303.cern.ch:9000/y7eIsv-bHDpuNjE8v2Y-yw
==========================================================================
 
You can then view the status and get the output of all jobs via the single job id which is returned by glite-wms-job-submit.
 

Data management

UNICORE

Import/Export files

To import files form the local computer to the job directory (or to export files from the job directory to the local computer ) user should indicate the source and the target files in the .u script:
 
{
    Imports: [
       { From: "/path/fileName", To: "remoteFileName" },
    ]
   Exports: [
        { From: "remoteFileName", To:"/path/localFileName" },  
    ]
 }
 
Storage
 
To import files from UNICORE storage (or to export files to UNICORE storage) in the job description file the storage address should have a format: u6://TargetSystemName/StorageName/fileName:
 
{
    Imports: [
       { From: "u6://TargetSystemName/StorageName/fileName", To: "remoteFileName" },
    ]
   Exports: [
        { From: "remoteFileName", To:"u6://TargetSystemName/StorageName/fileName" },  
    ]
 }
 
The file can be also uploaded into the storage and downloaded from the storage using ucc put-file and get-file commands. ls allows user to list files in the directory:
 
$ ucc put-file -s fileName -t u6://TargetSystemName/StorageName/fileName
$ ucc get-file -s u6://TargetSystemName/StorageName/fileName -t newFileName
$ ucc ls u6://TargetSystemName/StorageName/fileName
 

Resources

In the Resources section of the .u script user can specify resources to run the job on the remote system. The section may look as follows:
 
 Resources: {
    Memory: 128000000,
    Nodes: 1,
    CPUs: 8 ,
 }
 

Example

In this example the localScript.sh file is a script which writes the word "Hello" to the file named "newFile":
 
echo "Hello" >> newFile
 
User can create localScript.sh file on the local computer. To upload it to the storage one should first check the Target System name (in this example: EMI-UNICOREX) and available storages (in this example: Home). After uploading the file, the storage can be listed:
 
$ ucc list-sites
$ ucc list-storages
$ ucc put-file -s localScript.sh -t u6://EMI-UNICOREX/Home/storageScript.sh
$ ucc ls u6://EMI-UNICOREX/Home
 
In the example, the script in the storage was named storageScript.sh, so such file should be placed on the list created by the ucc ls command.
 
The bash.u file is a script which imports the storageScript.sh from the Home storage to the job directory and renames it to the remoteScript.sh. The script is run creating the newFile. After that thenewFile is downloaded to the user's local computer and renamed to the localNewFile. It is also exported to the UNICORE Home storage.
 
{
 ApplicationName: "Bash shell",
 Environment: [
                "SOURCE=remoteScript.sh",
        ],
         Imports: [
                { From: "u6://EMI-UNICOREX/Home/storageScript.sh", To: "remoteScript.sh"}
        ],
        Exports: [
                { From: "newFile", To: "localNewFile"},
                { From:"newFile", To: "u6://EMI-UNICOREX/Home/storageNewFile"}
        ],
 Resources: {
        CPUs: 1 ,
 }
}
 
The bash.u job can be run and the files in the storage can be listed. In the user's local computer there should be created localNewFile file.
 
$ ucc run bash.u -v
$ ucc ls u6://EMI-UNICOREX/Home
 

ARC

If your job needs input data, or produces output data, you do not need to copy these files by hand, as ARC will take care of all data movement. You only have to specify inputfiles and outputfiles in job description (see XRSL manual for details).
 
In addition, ARC provides command line tools for basic work with any storage elements: list, copy and remove files. Most common protocols are supported by ARC: gsiftp, http, ftp, as well as meta-protocols like srm, lfc, rls (see XRSL manual for details).
 
To list files in an SRM storage (e.g. dCache), do:
 
$ arcls srm://srm.myplace.org
 
To copy files use:
 
$ arccp http://www.mystuff.org/file1 gsiftp://se.myplace.org/file1
 
To remove files, use:
 
$ arcrm gsiftp://se.myplace.org/file1
 
Any combination of supported protocols can be used; authorisation on Grid storages is performed on the basis of your Grid proxy.
 
An interesting functionality of arcls or arccp is that they can be used even to check files created by your Grid jobs: you can use arcls to list the current working directory of the job or you can use arccp to copy a temporary result file from the execution site to your local machine, even while the job is running:
 
$ arcls JOBID
$ arccp JOBID/filename localname
 
This is however not recommended; especially avoid using arcrm on your job, unless you really know what you are doing.
 

gLite

Create a local file, and then store it on an available SE:
 
$ echo "This a sample file" > example.txt
$ cat example.txt
This a sample file
$ lcg-cr -d  lxbra1910.cern.ch file:$PWD/example.txt
 GSIFTP: default set up URL mode
GSIFTP: dest: set up FTP mode. DCAU disabled. Streams =  1, Tcp BS = 0
guid:e2edabff-3fa7-4853-b44a-9cab256befdb
 
The file has been stored on the SE lxbra1910.cern.ch and the lcg-cr command returns a Grid Unique Identifier (guid) for our file. Our file has also automatically registered in the File Catalog and assigned a Logical File Name (lfn). With the option -l we could specify an lfn for the file. The File Catalog provides an easier way to identify and browse our files using these Logical File Names.
To see our file we can use the File Catalog command lfc-ls command to list all files, in this case we will limit it to files created today.
 
$ lfc-ls /grid/$MYVO/generated/2011-07-11
file-99018d3a-138c-4344-82a0-48a2ad10c27b
 
Note that the identifier returned here is the lfn not the guid. If we want to see the guid we can use the lcg-lg command
 
$ lcg-lg  lfn:/grid/$MYVO/generated/2011-07-08/file-99018d3a-138c-4344-82a0-48a2ad10c27b
guid:e2edabff-3fa7-4853-b44a-9cab256befdb
 
To copy the file from the SE to the UI we can use the lcg-cp command as follows
 
$ lcg-cp guid:e2edabff-3fa7-4853-b44a-9cab256befdb file.txt
 
or we could use the lfn
 
$ lcg-cp lfn:/grid/$MYVO/generated/2011-07-08/file-99018d3a-138c-4344-82a0-48a2ad10c27b file.txt
 
Of course, the lfn is more useful if you set it to a sensible value when creating our file with the -l option. Try creating another file with a logical file name containing your user id, for example lfn:/grid/$MYVO/generated/2011-07-11/USERNAMEXX.txt.
 
We can now delete the registered file using the GUID; if we check for file existence after deletion, we obviously don't find it.
 
$ lcg-del -a guid:e2edabff-3fa7-4853-b44a-9cab256befdb
$
$ lfc-ls /grid/$MYVO/generated/2011-07-11
 

dCache

dCache is another EMI product which manages access to disk and tape storage. We will look briefly at some of the file access methods supported by dCache.
 

SRM

 

browsing files:
 
$ srmls -2 srm://sligo.desy.de:8443/pnfs/desy.de/data/testers.eu-emi.eu/
 
writing file to SE:
 
$ srmcp -2 file://////etc/group  srm://sligo.desy.de:8443/pnfs/desy.de/data/testers.eu-emi.eu/group_DDMMYY_[A-Za-z]
$ srmls -2 srm://sligo.desy.de:8443/pnfs/desy.de/data/testers.eu-emi.eu/
 
writing file back from SE
 
$ srmcp -2 srm://sligo.desy.de:8443/pnfs/desy.de/data/testers.eu-emi.eu/group_DDMMYY_[A-Za-z] file://///tmp/groups_080711A.back
 
deleting a file:
 
$ srmrm -2 srm://sligo.desy.de:8443/pnfs/desy.de/data/testers.eu-emi.eu/group_DDMMYY_[A-Za-z]
 

dCap

 

writing file to SE
 
$ dccp /etc/group dcap://xen-ep-emi-tb-se-3.desy.de:22125/pnfs/desy.de/data/testers.eu-emi.eu/group_DDMMYY_[A-Za-z]
$srmls -2 srm://xen-ep-emi-tb-se-3.desy.de:8443/pnfs/desy.de/data/testers.eu-emi.eu/
 
writing file back from SE
$ dccp dcap://xen-ep-emi-tb-se-3.desy.de:22125/pnfs/desy.de/data/testers.eu-emi.eu/group_DDMMYY_[A-Za-z] /tmp/group_DDMMYY_[A-Za-z].back
 

webDAV

 

Browse files from command line:
 
$ cadaver http://sligo.desy.de:2880
> ls
> bye
 
GUI clients: nautilus, firefox add-on TrailMix (now proprietary), OS-based file browsers that support webDAV.
 
Write files:
 
$ curl -v -T /etc/group http://sligo.desy.de:2880/pnfs/desy.de/data/testers.eu-emi.eu/testFileCURL_DDMMYY_[A-Za-z]
 
Look for the file through srmls or cadaver.