Metadata are everywhere around us - and they are expressed in many incompatible formats. Metadata are the Tower of Babel of the current world. Let's concentrate on the Soaplab2's contribution to the world chaos.
Soaplab2 uses two major types of metadata:
The latter one (metadata describing individual services) is the main topic of this document.
Each service has to be described. In the run time (when Soaplab2 services are serving the world) such description is also expressed in the XML format - but a service provider can (and usually does) use a simpler, more "human-readable" (and better editable) format - the ACD format (more about its origin a bit later). Soaplab2 converts (in the build, deployment time) the ACD files to its native XML files.
Here is an example of a simple ACD file for the classic HelloWorld application:
appl: HelloWorld [ documentation: "Classic greeting from the beginning of the UNIX epoch" groups: "Classic, Simple" nonemboss: "Y" executable: "echo" ] string: greeting [ additional: "Y" parameter: "Y" default: "Hello World" comment: "defaults" ] outfile: output [ additional: "Y" default: "stdout" ]The example above defines a service HelloWorld
The XML file representing the same metadata (and created by the ACD to XML converter from the file above) is much less human-readable and you do not want/need to see it (for developers: its DTD is also available).
The original purpose of Soaplab was to wrap the command-line programs as Web services. That's why on many places in this documentation we relate ACD files to the command-line options. But as Soaplab evolved it became clear that the ACD files could describe not only command-line tools but other type of programs, as well. Soaplab2 can make Web services around regular web pages (see a sub-project Gowlab) or on top of other Web services (for example, the wrappers around the Web services provided by EBI are distributed together with Soaplab2 as plug-ins).
It would be, therefore, more precise to talk about "inputs", "parameters" and/or "outputs", instead of "command-line parameters and/or options". Adapt your mind please when reading this document. By applications, we mean here a command-line program, or anything else that can be wrapped as a Web service and that needs some input data and can produce some resulting data.
Now, we can finally start with the syntactical rules...
Each application is defined by one ACD file. The ACD file names will be used as a (part of) names of the Soaplab2 services. The ACD files use the extension '.acd'.
ACD files can have comments. They start with "#" and continue to the end of the line. Empty lines are ignored.
Each application has zero or more parameters. In an ACD file, each of them is defined by a single token followed by either a colon ':' (preferred) or an equal sign '=' which in turn is followed by a second token. The definition of the parameter is delimited by a pair of square brackets '[ ]', which can span multiple lines:
token: token [ definition ]Tokens representing keywords can be abbreviated up to the point where they are not ambiguous. For example, default: can be abbreviated to def:.
The first definition in any ACD file must start with token application (usually abbreviated to appl) followed by a token representing the application name, and by attributes describing the whole application (not just an individual parameter):
appl: appname [ application attributes ]Then follow definitions of the application's parameters. The first token in these definitions is a datatype. The data type is a keyword and must be from the pre-defined set of data types. The second token is the name by which this parameter is going to be known. The definition contains parameter-specific attributes (various data types may have different set of allowed attributes):
datatype: parameter_name [ parameter attributes ]Both application and parameter attributes consist of a name (a keyword from a pre-defined set of names), followed by a colon and a value. Values can be delimited by single or double quotes. A value representing a boolean value should be either Y or N.
infile: library [ additional: "Y" information: "A custom PostScript library file" comment: "tagsepar -" ]If there is no real value (meaning: nothing separated by a space) the resulting Soaplab2 option is considered to be boolean with the value true (and the quotes are not needed), as in this example:
comment: bindataAn ACD file may have any number of such comment attributes. If you are a developer creating your own plug-in, you can use the comment attributes for passing any options or any additional metadata. Of course, if you do so, only the selected clients will be able to use such additional information - but Soaplab2 does not stop to propagate unknown options to the XML metadata.
appl: dot [ ... ]When the wrapped resource is not a command-line program (but, for example, a web page) the application name is not used (but must be present).
Here is a list of recognized application attributes (note that those known only to Soaplab2 are already shown as part of the comment attribute):
The group name, together with the ACD base file name, creates the Soaplab2 service name (using a dot as a separator and making everything lower-case). If a tool belongs to more groups, more Soaplab2 services are created (even though they are identical in their behaviour). For example:
appl: HelloWorld [ documentation: "Classic greeting from the beginning of the UNIX epoch" groups: "Classic, Simple" ... ]If the lines above are from a file helloworld.acd, two Soaplab services will be created:
classic.helloworld simple.helloworld
It is not recommended to put here the full path of the executable - it would make the ACD file non-portable. The path can be added later, as a run-time configuration property (look for the property addtopath.dir in the configuration guide for details).
An example was given above for the hellowold application where the executable was the Unix program echo.
comment: "class org.soaplab.ebi.WSJobFactory"
comment: "launcher get" comment: "launcher post"
Look for individual attribute details in the parameter attribute section.
datatype: parameter_name [ parameter attributes ]This section describes the parameter attributes that can be used for any data type (data type is here just another name for a parameter type).
Any parameter without this attribute will appear (unless an attribute template is specified) on the command line as a tagged value:
-<parameter-name> <parameter-value>With this attribute it will appear as a simple value without any qualifier (i.e. the parameter name will not appear on the resulting command line).
The above general attributes are the most used ones. There may be however situations where their combinations are unclear. This is caused by the EMBOSS legacy: EMBOSS has more ways how to define whether its parameter is optional or not. Therefore, below are the possible (useful) combinations of the these parameters, explained by examples.
-<parameter-name> <parameter-value>where the space between parameter name and parameter value can be changed to something else by the attribute tagsepar, or
If none of these cases is good enough, the attribute template can be used. It defines a string with special tokens that will be replaced by parameter name and parameter value. This attribute is similar in function to the application attribute method. The difference is that here the template is just for one parameter, while the method defines a template string for the whole command-line. Details of both are covered below.
-greeting HelloIf this attribute is specified, the parameter name is not anymore started with the minus sign. For example, having this in an ACD file:
string: greeting [ comment: "tagsepar *" ... ]and with a sent value "Hello" leads to the command line:
greeting*HelloFrom the historical syntactical reasons (legacy), use slightly different syntax when the tagsepar should be an equal sign (which is probably the most common case after the space). This ACD file:
string: greeting [ comment: "tagsepar = =" ... ]leads to the command line:
greeting=HelloYou can also use the same syntax for other separators. The first example above can be written also as:
string: greeting [ comment: "tagsepar = *" ... ]Because usually the same tag separator is used for all parameters, the tagsepar may be used also as an application attribute - and, therefore, propagated to all parameters. If in such case you need for some parameters to return to the default tag separator (a space), use there a tagsepar without any value. Here is an example with all possibilities:
appl: Tagsepar [ documentation: "Testing TAGSEPAR attribute" groups: "Testing" nonemboss: "Y" executable: "echo" comment: defaults comment: "tagsepar = =" ] # --- this prameters gets TAGSEPAR from the application level string: param1 [ default: 1 ] # --- this prameters has its own TAGSEPAR string: param2 [ comment: "tagsepar = -" default: 2 ] # --- this prameters wants to have an original (default) TAGSEPAR string: param3 [ default: 3 comment: tagsepar ] outfile: output [ additional: "Y" default: "stdout" ]
This attribute is ignored for data types boolean, infile, filelist and outfile.
string: g [ ]The user/client must send a value for this parameter under the name g. Assuming that she sent a value "hello", the command line will look like this:
-g helloThis works fine but it is hard for the client to remember the purpose of a non-descriptive parameter name g. In order to make it easier for client, one can use attribute qualifier. Once present, it is used on the command line instead of the parameter name (and the parameter name is still used to identify client value). The example above can be re-written like this:
string: greeting [ qualifier: g ]The user sends now her "hello" under the better name greeting but the command line is created as before, using the qualifier g.
By default, parameters are not hidden. Except for the data type outfile that is always hidden (which means that the clients do not specify the name of the output file that will be created on the command line - it is the Soaplab2's task to do so).
This attribute is often used for Gowlab services - where HTML forms may have some "hidden" variables.
boolean: bool_env [ information: "b4: A boolean that becomes an environment" qualifier: "b4" comment: envar ] string: str_env [ information: "str: A string that becomes an environment ENVVAR" additional: "Y" qualifier: ENVVAR default: "Ciao mundi" comment: envar comment: "defaults" ]This attribute is ignored for data type infile, filelist and outfile.
string: param [ standard: "Y" ]
string: param [ parameter: "Y" ]
string: param [ ]Note, that the following construct (a bit against the common sense) DOES NOT mean that the param is mandatory:
string: param [ additional: "N" ]The both constructs a) and b) become NOT mandatory if the definitions include a default value:
string: param [ standard: "Y" default: "a default value" ]or
string: param [ parameter: "Y" default: "a default value" ]Or, if the same definition has also additional Y.
The first purpose of this attribute is that its presence makes the parameter optional (as explained above).
The second (and the main) purpose, of course, is to tell what value will be used if the user does not send any value for a particular parameter. The question is, however, should such default value be used (e.g. should it be put on the command-line), or not? The answer is: it depends.
Soaplab2 has the following rules for that (for the situation that no value was sent by the user for a particular parameter):
Every application and each of its parameters (especially its inputs and outputs, but also other parameters) can have its meaning explicitly specified by ontology terms. Simply speaking, an ontology is a controlled vocabulary with terms pre-defined and agreed on by domain experts. Additionally to such simple vocabulary, an ontology can have (and usually has) defined relationships between its terms (saying, for example, which terms are more general and which are specific). The ontology terms do not influence how the application is invoked and what it does but can be useful, for example, for discovering which application to use for your particular data types.
In an ACD file, the ontology terms can be specified by one or more relations attribute(s). Such attribute can be used both as an application attribute or as a parameter attribute. There are two possible formats for the value of this attribute (the second one is, however, recommended):
relations: "/edam/operation/0001813 Sequence retrieval"
relations: "EDAM:0001813 operation Sequence retrieval"Both examples above say that this application (or this parameter) has semantics defined by an ontology term identifier EDAM:0001813 and that this term is defined in an ontology namespace operation. Additionally, more or less for convenience (because the same data could be obtained directly from the ontology itself by using the ontology term identifier), there is also an ontology term name Sequence retrieval.
You may use any ontology, of course. But one strong candidate is the EDAM ontology that is closely related to the EMBOSS and, therefore, all EMBOSS applications (ACD files) already are annotated by relations from this ontology.
The usual way is to build the command-line from the individual parameters, each of them can be created either as a tagged value:
-<parameter-name> <parameter-value>or as a non-tagged value (when the parameter attribute parameter is set to true):
<parameter-value>The first form can be further modified by the tagsepar attribute, resulting, for example, in this:
<parameter-name>=<parameter-value>If this is not enough, a parameter can have an attribute template that defines how the parameter should be built. The template value is a string that contains special tokens (e.g. "$$" or "&&"). The tokens will be substituted by the real parameter value (sent by a client) or by a parameter name (the latter substitution is not that crucial because you can yourself include the parameter name directly; but it helps with the maintenance of ACD files if you have the parameter name only in one place).
If a parameter does not have any value (a client sent nothing, and there is no default value), nothing is built for this parameter. In other words, the template string is not used at all (even if it has some constant text there).
Let's have a service that has one input named greeting, and a client sends there a value "I wish you luck, mate".
Without templates and with the following ACD parameter:
string: greeting [ ]we are getting a command line with exactly two elements (it is shown in the format used by the admin tool for exploring parameters, mentioned above):
Command line ------------ ( 1) -greeting ( 2) I wish you luck, mate!You can see that Soaplab2 took care and made sure that the whole sent value is not separated by white-spaces.
However, if you use a template, you have to tell whether you wish or not to wish to separate by white-spaces. Let's have now the same parameter with a template:
string: greeting [ template: "-greeting $$" ]The token "$$" is replaced by the client value. The result is now, however, different - the command line suddenly gets six elements:
Command line ------------ ( 1) -greeting ( 2) I ( 3) wish ( 4) you ( 5) luck, ( 6) mate!In order to get the same result as without a template, you need to use a slightly different token (a double-quoted $$):
string: greeting [ template: '-greeting $"$"' ]Now, we are getting the same command line with just two elements:
Command line ------------ ( 1) -greeting ( 2) I wish you luck, mate!
A template as an application attribute is specified as:
appl name [ ... comment: "method <string>" ... ]And a template as a parameter attribute is specified as:
data-type name [ ... template: "<string>" ... ]Some tokens can be abbreviated ("$$", "&&", etc.). This abbreviation cannot be used for the application attribute (because the abbreviated form does not tell which parameter is the token meant for).
Token | Will be replaced by | Explanation |
---|---|---|
Parameters' values | ||
$id | value of parameter 'id' | |
${id} | value of parameter 'id' | used when the construct is followed immediately by a letter (like in ${id}A) |
$"id" | value of parameter 'id' | value will not be separated into several pieces even if it contains white-spaces |
${"id"} | value of parameter 'id' | ditto as above |
$$ | value of the current element | a simpler notation for $id; cannot be used in application attribute |
$"$" | value of the current element | a simpler notation for $"id"; cannot be used in application attribute |
Parameters' names | ||
&id | name of parameter 'id' | |
&{id} | name of parameter 'id' | the same as &id but this protects from being mixed with the subsequent letters |
&& | name of the current parameter | a simpler notation for &id; cannot be used in application attribute |
appl: MedlineSRS [ documentation: "Get MEDLINE citation (in XML)" groups: "Testing" nonemboss: "Y" supplier: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "method -e+[MEDLINE:'$pmid']+-ascii" ] string: pmid [ parameter: "Y" ] outfile: result [ ]The example is from Gowlab: it fetches a publication from bibliographic database Medline via SRS at EBI.
And here is a short example with a template as a parameter attribute:
string: greeting [ additional: "Y" default: "Hello World" comment: "defaults" template: "Soaplab2 sends you regards: $$" ]The result command line will be:
Program and parameters: /home/senger/soaplab2/run/echo Soaplab2 sends you regards: Hello World --- end of parametersThe same example with a slightly changed template (in order not to separate by white-spaces):
string: greeting [ additional: "Y" default: "Hello World" comment: "defaults" template: '"Soaplab2 sends you regards:" $"$"' ]Now the result command line is:
Program and parameters: /home/senger/soaplab2/run/echo Soaplab2 sends you regards: Hello World --- end of parameters
In an ACD file, every input and output is described in a separate construct that starts with a data type:
datatype: parameter_name [ parameter attributes ]There are not many data types (EMBOSS applications have much more of them - but we are not creating ACD files for EMBOSS native applications because they already come with the ACD files).
Specific attributes for this data type are:
If this attribute is not specified then both, direct and reference input are acceptable:
infile: input [ ]In the example above, a client sees (and can use) two input names (both are created from this one ACD parameter):
<parameter-name>_direct_data <parameter-name>_urlThe parameter name is extended by fixed suffixes. The _direct_data means that a user is sending directly data, the _url means that a user is sending a reference to the data. The reference should be a URL where the input data can be fetched from. Usual protocols (http, ftp) are supported.
For EMBOSS, the reference data (on the client site) has suffix _usa instead of _url (see what USA in EMBOSS means).
If this attribute has the value direct, only direct data can be sent by a client. In this case, the input name is identical with the parameter name (input in our example), no suffixes are used:
infile: input [ comment: "data direct" ]Similarly, if this attribute has the value filename, only reference data can be sent by a client. Again in this case, the input name is identical with the parameter name (input in our example), no suffixes are used:
infile: input [ comment: "data filename" ]
infile: input [ additional: "Y" default: "this is my default" comment: default_for_direct ]
#!/usr/bin/perl -w # # It copies a file (-i) (if given) to STDOUT. # Then it adds to STDOUT contents of files given by names # by the -l option. # # Usage: copy-files.pl -i <input-file> -l <list-file> # # --------------------------------------------- use strict; use warnings; use File::Copy; use File::Basename; use Getopt::Std; my %opts; getopt ('il', \%opts); exit 0 unless $opts{i} or $opts{l}; copy ($opts{i}, \*STDOUT) if $opts{i}; if ($opts{l}) { my $dir = dirname ($opts{l}); $ARGV[0] = $opts{l}; while (<>) { chomp; copy ($_, \*STDOUT) or copy ("$dir/$_", \*STDOUT) or warn "Copy of $_ failed: $!\n"; } }And here is a full ACD for the "application" above:
appl: Files [ documentation: "Copying and merging files to standard output" groups: "Testing" nonemboss: "Y" executable: "copy-files.pl" ] infile: input [ additional: "Y" qualifier: "i" ] filelist: list [ additional: "Y" qualifier: "l" ] outfile: output [ default: "stdout" ]In the filelist data type, all input data are considered direct data. If you need a list of reference data, you would need to write your own input adaptor (see the comment "input_adaptor <class-name>" above, and look for inspiration in the existing input adaptor org.soaplab.services.adaptor.InputManyFiles - the one actually used to implement the functionality of the filelist data type).
A specific attribute for this data type is:
Here is an example showing the input data types:
string: text [ additional: "Y" default: "this is a default" prompt: "An optional string with a default value" ] string: text_no_default [ standard: "Y" prompt: "A mandatory string input" ] integer: number_int [ additional: "Y" default: 42 ] float: number_float [ additional: "Y" precision: 2 default: 30.12 ] boolean: bool_false [ additional: "Y" default: false ] boolean: bool_true [ additional: "Y" default: true ] boolean: bool_no_default [ additional: "Y" ]You can try a service created from the above - the service name (in the default Soaplab2 distribution) is testing.inputtypes.
-menu F
-format jpg,svg
The specific attributes for this data type are:
Do not confuse it with attributes delimiter and codedelimiter - they both define how to write values in an ACD file but not how the resulting parameter will look like.
Each individual value can be a bit more complex: it can consist of two values, separated by attribute codedelimiter (default is colon). The first value is the one that matters - one that is accepted from client and that appears in the resulting parameter. The second value is just a better human-readable text for the first value (that may be useful for some client's GUIs).
appl: Lists [ documentation: "How to use lists" groups: "Testing" nonemboss: "Y" executable: "echo" comment: defaults ] list: format [ additional: "Y" default: "png" values: "canon; dot; fig; gd; gif; hpgl; imap; jpg; mif; mp; pcl; pic; plain; png; ps; svg" prompt: "Graphical format" comment: "separator |" ] list: menu [ default: "V" minimum: "1" maximum: "1" values: "F--fungi,I--insect,P--plant,V--vertebrate,O--other,C--Custom" delimiter: "," codedelimiter: "--" prompt: "Transcription Factor Class" information: "Select class" ] outfile: output [ additional: "Y" default: "stdout" ]If the service (created form the ACD above) is called (using the command-line client) as:
build/run/run-cmdline-client -name testing.lists -w -r -format_canon -format_fig menu Fthe resulting command line will look like:
-format canon|fig -menu F
For example, here is an ACD file with all possible streams defined. The application (a Perl script) is distributed with Soaplab2 in file run/all-streams.pl.
appl: Streams [ documentation: "Filtering stdin into stdout and stderr streams" groups: "Testing" nonemboss: "Y" executable: "all-streams.pl" ] infile: input [ additional: "Y" default: "stdin" ] outfile: std_output [ additional: "Y" default: "stdout" ] outfile: std_errors [ additional: "Y" default: "stderr" ]
You can influence where these "URL results" will be
served from by setting several run-time configuration properties (look
for properties
Once you specify an output type, you should make sure that the underlying application produces such type. Soaplab2 itself treats data as binary data if the ACD file says so - but it does not create an array of data (except for graphical EMBOSS programs that produce images on several pages). If you want them you need to create your own plug-in. An example of such plug-in (doing nothing useful) is the class class org.soaplab.samples.OutputTypesJob. And here is a complete ACD file for it:
appl: AllOutputTypes [ documentation: "Showing how a plugin can create all kinds of outputs" groups: "Plugins,Testing" nonemboss: "Y" comment: "class org.soaplab.samples.OutputTypesJobFactory" ] infile: input [ standard: "Y" help: "This input will be copied to several outputs. <p> For some of outputs, it will be even replicated (how many times, it depends on the parameter <em>count</em>)." ] integer: count [ additional: "Y" default: "3" prompt: "How many times to replicate input in the array outputs" comment: defaults ] outfile: simple_text_output [ ] outfile: simple_binary_output [ comment: bindata ] outfile: array_text_output [ comment: "output_type String[]" ] outfile: array_binary_output [ comment: bindata comment: "output_type byte[][]" ]
appl: Results [ documentation: "Testing an output adaptor" groups: "Testing" nonemboss: "Y" executable: "echo" ] string: param [ parameter: "Y" default: "this is a result" comment: defaults ] outfile: output [ additional: "Y" default: "stdout" comment: "output_adaptor org.soaplab.services.testing.TestingDataAdaptor" ]
For example, a parameter attribute extension for an output may depend on the value (sent by a client in the run-time) of an input parameter format. It would be nice to specify in the ACD file:
outfile: output [ extension: ${format} ]
If you are a client developer, you may wish to know also how the clients can access service metadata and how to benefit from them.
In any case, because this chapter is about the building, it is recommended to look first into the build guide what is the Ant tool, how to use it, and what are the built-time properties.
If you are wondering where are the ACD files for EMBOSS, they are not part of the Soaplab2 distribution but part of the EMBOSS itself. More about using them is in the EMBOSS notes.
Name your ACD files with names without spaces and other strange characters (that's because the file name become also part of the resulting Web service name).
Or, create your ACD files in your own directory and use
They are in the converter's configuration file src/etc/config/generator.config.template. Every time a converter is invoked, this file is copied into metadata/old.generator/al.Cfg.pl. During this time, Ant will substitute some data there from the following built-time properties:
This property can be used together with the built-time
property
The same configuration file contains also a name of a directory where the XML files will be generated. There is, however, no property to change it. The target directory is metadata/generated.
ant genBy the way, the task gen is also called when you install Soaplab2 (from the task install).
To convert only ACD files from some of those directories, use individual tasks:
ant gensowa ant gengowlab ant genebi ant gentestAll the gen tasks create metadata in directory metadata/generated. For each category there will be a sub-directory.
The gen... tasks can by customized by setting some built-time properties:
ant "-Dsa=helloworld dot sleep" gensowaBe aware that the converter also creates an application list. If property
The
The property
ant -Dsl=MyApps.xml gen
ant -Dsl=MySowaApps.xml gensowa ant -Dsowa.sl=MySowaApps.xml gensowaWith the general gen task, use the not-shortened properties:
ant -Dsowa.sl=MySApps.xml -Dgowlab.sl=MyGApps.xml -Debi.sl=MyEApps.xml genDefault names are:
Typical example: Create your ACD files in a directory myacd (or copy there some existing ACDs from the Soaplab2 distribution - as I did for this documentation). Then run:
ant -Dacd.dir=myacd -Dsl=MyApps.xml gensowaYou may see a similar report to this one:
_gen: [copy] Copying 1 file to /home/senger/soaplab2/metadata/old.generator [echo] /home/senger/soaplab2/metadata/old.generator/acd2xml -d -l MyApps.xml -p . -r myacd dot helloworld [acd2xml] Processing dot... [acd2xml] using myacd/dot.acd [acd2xml] (generated into module graphics) [acd2xml] Created: /home/senger/soaplab2/metadata/generated/graphics/dot_al.xml [acd2xml] Processing helloworld... [acd2xml] using myacd/helloworld.acd [acd2xml] (generated into module classic) [acd2xml] Created: /home/senger/soaplab2/metadata/generated/classic/helloworld_al.xml [acd2xml] Created: /home/senger/soaplab2/metadata/generated/MyApps.xml BUILD SUCCESSFUL Total time: 4 seconds
Once the ACD to XML converter has created metadata files, you have to
put the names of the application list files (not the names of
the individual service metadata files) in the run-time configuration
file as one or more property
base.dir = /home/senger/soaplab2 metadata.dir = ${base.dir}/metadata/generated applist = ${metadata.dir}/OtherApplications.xml applist = ${metadata.dir}/GowlabApplications.xml applist = ${metadata.dir}/EBIApplications.xml applist = ${metadata.dir}/EMBOSSApplications.xml