Metadata Guide

Metadata are information about Soaplab2 services. Such as service name, what inputs a service can consume, what types of results it can produce, and much more. Metadata are cornerstones of Soaplab2: A Soaplab2 service provider does not need to program anything but she needs to know how to describe her services by using metadata.

Metadata are everywhere around us - and they are expressed in many incompatible formats. Metadata are the Tower of Babel of the current world. Let's concentrate on the Soaplab2's contribution to the world chaos.

Soaplab2 uses two major types of metadata:

Metadata containing a list of available Soaplab2 services and how they are grouped together,
and metadata containing details about each Soaplab2 service.

The former one is a very simple XML file (its DTD is in SoaplabList.dtd) that a casual service provider does not need to know much (if anything) about because this metadata file is created automatically when services are being deployed. Therefore, its contents is sketched only in the developers guide. A service provider just needs to know which such lists to include in the Soaplab2 configuration - that part is described in the configuration guide (look there for the property applist).

The latter one (metadata describing individual services) is the main topic of this document.

Each service has to be described. In the run time (when Soaplab2 services are serving the world) such description is also expressed in the XML format - but a service provider can (and usually does) use a simpler, more "human-readable" (and better editable) format - the ACD format (more about its origin a bit later). Soaplab2 converts (in the build, deployment time) the ACD files to its native XML files.

At the moment, this conversion is the only remaining part of Soaplab2 that is not pure Java. The ACD to XML converter is still written in Perl. Which does not matter much on the Linux machines but may be a disadvantage under Windows operating systems. The pure Java converter is, however, coming, hopefully sooner than later.

ACD metadata files Getting started with the ACD syntax Application attributes General attributes for all data types How to say that a parameter is mandatory How to use default values How to specify semantics Building the command-line or request Data types and their specific attributes Data types for input real data (like files) Data types for input arguments Data types for outputs (Some) known issues
Building metadata Where to create metadata How to convert ACD to XML files How to tell Soaplab about XML files

ACD metadata files

The ACD files are files describing parameters and behavior of the Soaplab2 services, using format introduced by EMBOSS. The format is flexible enough to be used also for non-EMBOSS applications. Such files can be created relatively easily by manual editing.

The term "created relatively easily" is a relative term, of course. There is an ongoing effort to add an interactive editor for ACD files to Soaplab2. But it is not a high priority, at least not at the moment. Any takers?

Another advantage of using the ACD format is that all EMBOSS tools already are described by the ACD files (the EMBOSS distribution includes all their ACD files). Therefore, deploying EMBOSS applications as Web services is a question of few clicks or commands - after all, that was the main motivation behind Soaplab from its beginning.

Here is an example of a simple ACD file for the classic HelloWorld application:

appl: HelloWorld [
  documentation: "Classic greeting from the beginning of the UNIX epoch"
  groups: "Classic, Simple"
  nonemboss: "Y"
  executable: "echo"
]

string: greeting  [
  additional: "Y"
  parameter: "Y"
  default: "Hello World"
  comment: "defaults"
]

outfile: output  [
  additional: "Y"
  default: "stdout"
]

The example above defines a service HelloWorld

that is visible to the world as a Web service classic.helloworld or simple.helloworld (each application can be qualified as part of several different groups),
that is represented by a command-line program echo (which will run fine on any Linux machine but may have problems under Windows where there is no such program usually),
that has one optional (additional) parameter named greeting whose default value ("Hello World") is always echoed by the command-line program (the service user can change the value by sending a different greeting value), and
that the output (the greeting itself) is produced on the standard output (of the echo program). Of course, Web services do not have "standard output" - but that is already a task of Soaplab to catch the output from echo and deliver it to the client machine as a result.

The XML file representing the same metadata (and created by the ACD to XML converter from the file above) is much less human-readable and you do not want/need to see it (for developers: its DTD is also available).

Getting started with the ACD syntax

The original documentation of the ACD syntax is, obviously, available from the EMBOSS. But keep in mind two things:

The EMBOSS uses many more parameters and attributes than is needed for the non-EMBOSS programs (for example, it recognizes an important, but EMBOSS specific, parameter type sequence). Soaplab2 understands and uses these parameters - but only for EMBOSS applications. And because it is unlikely that you would need to create ACD files for EMBOSS yourselves, you do not need to know these parameters. Therefore, if you read the original EMBOSS documentation on the ACD syntax, keep in mind that you can use only a subset of available parameters (the subset is described later in this document).
On the other hand, the non-EMBOSS applications need attributes that were never introduced to EMBOSS programs (for example, the attribute make_url for creating a reference to the service output, or the attribute method for creating more complex command lines). These new attributes are (usually) added to the ACD files as "comments" - this way the original EMBOSS parser and validator of the ACD files ignores them without complaining. The EMBOSS documentation obviously does not include these new attributes - you can find them only in this document.

The original purpose of Soaplab was to wrap the command-line programs as Web services. That's why on many places in this documentation we relate ACD files to the command-line options. But as Soaplab evolved it became clear that the ACD files could describe not only command-line tools but other type of programs, as well. Soaplab2 can make Web services around regular web pages (see a sub-project Gowlab) or on top of other Web services (for example, the wrappers around the Web services provided by EBI are distributed together with Soaplab2 as plug-ins).

It would be, therefore, more precise to talk about "inputs", "parameters" and/or "outputs", instead of "command-line parameters and/or options". Adapt your mind please when reading this document. By applications, we mean here a command-line program, or anything else that can be wrapped as a Web service and that needs some input data and can produce some resulting data.

Now, we can finally start with the syntactical rules...

Each application is defined by one ACD file. The ACD file names will be used as a (part of) names of the Soaplab2 services. The ACD files use the extension '.acd'.

ACD files can have comments. They start with "#" and continue to the end of the line. Empty lines are ignored.

Each application has zero or more parameters. In an ACD file, each of them is defined by a single token followed by either a colon ':' (preferred) or an equal sign '=' which in turn is followed by a second token. The definition of the parameter is delimited by a pair of square brackets '[ ]', which can span multiple lines:

token: token [

   definition

]

Tokens representing keywords can be abbreviated up to the point where they are not ambiguous. For example, default: can be abbreviated to def:.

The first definition in any ACD file must start with token application (usually abbreviated to appl) followed by a token representing the application name, and by attributes describing the whole application (not just an individual parameter):

appl: appname [ 

    application attributes

]

Then follow definitions of the application's parameters. The first token in these definitions is a datatype. The data type is a keyword and must be from the pre-defined set of data types. The second token is the name by which this parameter is going to be known. The definition contains parameter-specific attributes (various data types may have different set of allowed attributes):

datatype: parameter_name [ 

   parameter attributes

  ]

Both application and parameter attributes consist of a name (a keyword from a pre-defined set of names), followed by a colon and a value. Values can be delimited by single or double quotes. A value representing a boolean value should be either Y or N.

The ACD syntax also allows to group parameter definitions into several sections (input parameters, output parameters, advanced parameters and perhaps other). The Soaplab2 ignores sections - so they are not shown in the examples in this document. However, if you look into available ACD files for the EMBOSS programs you will notice them.

As already mentioned, Soaplab2 uses attribute comment to add its own (non-EMBOSS) attributes (options). The syntax for the values of such attributes is a name followed by a space, followed by its real value (both must be together quoted). For example, here is an attribute (an option) tagsepar with its value - (minus):

infile: library  [
  additional: "Y"
  information: "A custom  PostScript  library file"
  comment: "tagsepar -"
]

If there is no real value (meaning: nothing separated by a space) the resulting Soaplab2 option is considered to be boolean with the value true (and the quotes are not needed), as in this example:

comment: bindata

An ACD file may have any number of such comment attributes. If you are a developer creating your own plug-in, you can use the comment attributes for passing any options or any additional metadata. Of course, if you do so, only the selected clients will be able to use such additional information - but Soaplab2 does not stop to propagate unknown options to the XML metadata.

Application attributes

The first token of an ACD file must be application (or abbreviated, usually as appl), followed by the application name. The application name is used as the name of the executable command-line program - unless there is an attribute executable. For example, here the command-line program name is dot:

appl: dot [
   ...
]

When the wrapped resource is not a command-line program (but, for example, a web page) the application name is not used (but must be present).

Here is a list of recognized application attributes (note that those known only to Soaplab2 are already shown as part of the comment attribute):

documentation

A string describing briefly the function of the program. In some cases only the first part (up to the first dot) is used.

groups

This attribute allows grouping programs together based on their functionality. Its value contains one or more group names. When an application belongs to more then one group, the group names must be separated by either a comma or semi-colon.

The group name, together with the ACD base file name, creates the Soaplab2 service name (using a dot as a separator and making everything lower-case). If a tool belongs to more groups, more Soaplab2 services are created (even though they are identical in their behaviour). For example:

appl: HelloWorld [
  documentation: "Classic greeting from the beginning of the UNIX epoch"
  groups: "Classic, Simple"
  ...
]

If the lines above are from a file helloworld.acd, two Soaplab services will be created:

classic.helloworld
simple.helloworld

Even though these two services have different names, they both have the identical XML metadata (even in two separate files in two different directories).

relations

This attribute is conceptually similar to the groups attribute. It allows to describe application semantics - more about it in how to specify semantics.

supplier

It identifies, in free-text, who is supplying this application. For Gowlab it has a special (and important) purpose: it contains a URL of a web resource that is being wrapped (it is a mandatory attribute for Gowlab).

version

A version of this application, a free-text.

executable

A name of an executable program (or a script) representing this application. If missing, the application name (as specified in the appl line) is used instead. If the name of the executable has non-alphabetic characters, it must be specified here because the application name does not allow them.

It is not recommended to put here the full path of the executable - it would make the ACD file non-portable. The path can be added later, as a run-time configuration property (look for the property addtopath.dir in the configuration guide for details).

An example was given above for the hellowold application where the executable was the Unix program echo.

nonemboss

A boolean attribute that should be always used and should have always value Y. By default, all ACD files are assumed to describe EMBOSS applications. But Soaplab (precisely the generator of the metadata XML files) does slightly different things for EMBOSS and non-EMBOSS programs - so it needs to know what kind of application is being described. This is quite important attribute.

comment: "method <template>"

It defines how to create the whole command-line if the default appearance (tagged parameters with the qualifiers prefixed by minus) is not sufficient. It refers to individual parameters by constructs ${parameter-name}. The method can be one of the most complex ACD attributes. It is explained (together with the similar parameter attribute and with examples ) in a separate section below.

comment: "help <URL-with-a-help-page>"

A URL pointing to a web page explaining the whole application. Soaplab2 does not use this attribute but it is included in the metadata because it can enhance various Soaplab2 clients (for example Taverna client uses it).

comment: "class <class-name>"

Any Soaplab2 plug-in can specify here a class name that implements interface org.soaplab.services.JobFactory and thus Soaplab2 can load this particular plug-in. Read more about this way of extending Soaplab2 in the developers guide. For example, the plug-in wrapping EBI Web services (this plug-in is part of the Soaplab2 distribution) has in the ACD files:

comment: "class org.soaplab.ebi.WSJobFactory"

comment: "launcher <parameter>"

For regular Soaplab2 services, the launcher attribute is usually not needed.

Historically, in Soaplab1 it was used to define a Perl (or shell) script that inter-mediated command-line tools and Java Soaplab server, especially when they were invoked under particular environment, such as starting them in a queuing system. In Soaplab2, in the pure Java implementation, it is not needed (supported) anymore.

It may be, however, used by particular Soaplab2 plug-ins:

Gowlab defines here whether its remote web resource (web page) should be accessed by HTTP GET or POST method (the 'get' is default, thus it can be omitted):
```
comment: "launcher get"
comment: "launcher post"
```
The plug-in wrapping the EBI Web services uses this attribute to specify a class name implementing the org.soaplab.services.Job interface. Which means that this particular plug-in uses both, attribute class (for the job factory) and attribute launcher for individual job classes. See examples in src/etc/acd/ebi/ directory.

comment: "tagsepar <value>"

comment: "defaults"

Some parameter attributes may be also used as application attributes. If so, the value specified here is used for all parameters unless there is the same parameter attribute used, as well. It is like having a default value for parameter attributes valid for the whole ACD file.

Look for individual attribute details in the parameter attribute section.

General attributes for all data types

You may recall that each parameter in an ACD file is defined by this construct:

datatype: parameter_name [ 

   parameter attributes

  ]

This section describes the parameter attributes that can be used for any data type (data type is here just another name for a parameter type).

Well, syntactically, these attributes can be used for any data type, indeed. But for some of them, especially for the "outfile" one, they may not be that useful or used.

standard

A boolean attribute. Value Y indicates that this is a mandatory parameter.

additional

A boolean attribute. Value Y indicates that this is an optional parameter.

parameter

A boolean attribute. Value Y indicates that this parameter appears on the command line without its name as a qualifier. Also implies that its value is required.

Any parameter without this attribute will appear (unless an attribute template is specified) on the command line as a tagged value:

-<parameter-name> <parameter-value>

With this attribute it will appear as a simple value without any qualifier (i.e. the parameter name will not appear on the resulting command line).

default

It defines the default value for this parameter. Even though it sounds straightforwards it may not be. Check, therefore, also attribute defaults (below) and a small section how to use default value.

The above general attributes are the most used ones. There may be however situations where their combinations are unclear. This is caused by the EMBOSS legacy: EMBOSS has more ways how to define whether its parameter is optional or not. Therefore, below are the possible (useful) combinations of the these parameters, explained by examples.

prompt

Defines briefly what this parameter means.

information

More detail information about this parameter.

help

Very detail information, preferably containing a URL pointing to a complex description.

knowntype

A term, usually an ontology term, defining semantic of the data described by this parameter. It may be used by component-based software to find how can individual programs be bound together (what data flows between them are valid).

This attribute may not be that useful (and used) as it seems. Mainly because some applications (especially EMBOSS) can consume data (and produce results) of several different types. Caveat lector...

template

comment: method

Normally a parameter (its value) is put on the command line in one of the following ways:

As a tagged value:
```
-<parameter-name> <parameter-value>
```
where the space between parameter name and parameter value can be changed to something else by the attribute tagsepar, or
As a value without parameter name (and without any minus sign). This happens when attribute parameter: Y is used.

If none of these cases is good enough, the attribute template can be used. It defines a string with special tokens that will be replaced by parameter name and parameter value. This attribute is similar in function to the application attribute method. The difference is that here the template is just for one parameter, while the method defines a template string for the whole command-line. Details of both are covered below.

tagsepar

A character separating parameter name and parameter value in the resulting command-line parameter. Its default value is a space - which leads to this result (having a parameter greeting with value "Hello"):

-greeting Hello

If this attribute is specified, the parameter name is not anymore started with the minus sign. For example, having this in an ACD file:

string: greeting  [
   comment: "tagsepar *"
   ...
   ]

and with a sent value "Hello" leads to the command line:

greeting*Hello

From the historical syntactical reasons (legacy), use slightly different syntax when the tagsepar should be an equal sign (which is probably the most common case after the space). This ACD file:

string: greeting  [
   comment: "tagsepar = ="
   ...
   ]

leads to the command line:

greeting=Hello

You can also use the same syntax for other separators. The first example above can be written also as:

string: greeting  [
   comment: "tagsepar = *"
   ...
   ]

Because usually the same tag separator is used for all parameters, the tagsepar may be used also as an application attribute - and, therefore, propagated to all parameters. If in such case you need for some parameters to return to the default tag separator (a space), use there a tagsepar without any value. Here is an example with all possibilities:

appl: Tagsepar [
  documentation: "Testing TAGSEPAR attribute"
  groups: "Testing"
  nonemboss: "Y"
  executable: "echo"
  comment: defaults
  comment: "tagsepar = ="
]

# --- this prameters gets TAGSEPAR from the application level
string: param1 [
  default: 1
]

# --- this prameters has its own TAGSEPAR
string: param2 [
  comment: "tagsepar = -"
  default: 2
]

# --- this prameters wants to have an original (default) TAGSEPAR
string: param3 [
  default: 3
  comment: tagsepar
]

outfile: output [
  additional: "Y"
  default: "stdout"
]

comment: defaults

A boolean option. If defined it says "put on the command line the value of this parameter even if the value is a default one". By default, the default values are not put on the command-line. See also below, how it is related to the attribute default.

This attribute is ignored for data types boolean, infile, filelist and outfile.

qualifier

A default way how to create a command-line argument from an ACD parameter is to use parameter name as a name on the command-line. But remember that the parameter name is also used by users/clients to send parameter value. For example:

string: g  [
]

The user/client must send a value for this parameter under the name g. Assuming that she sent a value "hello", the command line will look like this:

-g hello

This works fine but it is hard for the client to remember the purpose of a non-descriptive parameter name g. In order to make it easier for client, one can use attribute qualifier. Once present, it is used on the command line instead of the parameter name (and the parameter name is still used to identify client value). The example above can be re-written like this:

string: greeting  [
   qualifier: g
]

The user sends now her "hello" under the better name greeting but the command line is created as before, using the qualifier g.

comment: "display false"

Some parameters may be hidden to the users (but still used when creating a command line). This attribute hides the parameter (the client does not get its name, at all).

By default, parameters are not hidden. Except for the data type outfile that is always hidden (which means that the clients do not specify the name of the output file that will be created on the command line - it is the Soaplab2's task to do so).

This attribute is often used for Gowlab services - where HTML forms may have some "hidden" variables.

comment: envar

A boolean attribute. It indicates that the value of this parameter should not be put on the command line but instead an environment variable should be created before the underlying application is executed. For example:

boolean: bool_env  [
   information: "b4: A boolean that becomes an environment"
   qualifier: "b4"
   comment: envar
]
string: str_env  [
   information: "str: A string that becomes an environment ENVVAR"
   additional: "Y"
   qualifier: ENVVAR
   default: "Ciao mundi"
   comment: envar
   comment: "defaults"
]

This attribute is ignored for data type infile, filelist and outfile.

relations

This attribute allows to describe parameter semantics - more about it in how to specify semantics.

How to say that a parameter is mandatory

This means that the param is mandatory. It will appear on the command-line with its tag/qualifier, e.g. as -param value:
```
string: param [
   standard: "Y"
]
```
This also means that the param is mandatory. It appears on the command-line without any tags, e.g. as value:
```
string: param [
   parameter: "Y"
]
```

You can change their mandatorness by swapping Y to N, in any of the above cases. If they are missing completely, it is the same as if they have N. Therefore, this construct means the param is not mandatory:

string: param [
]

Note, that the following construct (a bit against the common sense) DOES NOT mean that the param is mandatory:

string: param [
   additional: "N"
]

The both constructs a) and b) become NOT mandatory if the definitions include a default value:

string: param [
   standard: "Y"
   default: "a default value"
]

string: param [
   parameter: "Y"
   default: "a default value"
]

Or, if the same definition has also additional Y.

In EMBOSS, the parameters in ACD files are divided into sections - and the parameters being in the "required" section may have some implication on whether they are considered mandatory or not. But you will never create an ACD file for an EMBOSS application, will you?

How to use default values

The ACD/XML files can provide default value(s) for service parameters. They are defined by the default attribute.

The first purpose of this attribute is that its presence makes the parameter optional (as explained above).

The second (and the main) purpose, of course, is to tell what value will be used if the user does not send any value for a particular parameter. The question is, however, should such default value be used (e.g. should it be put on the command-line), or not? The answer is: it depends.

Soaplab2 has the following rules for that (for the situation that no value was sent by the user for a particular parameter):

For the input parameters (those representing input files, both with direct data and with the data references), the default (if exists) is always used.
For others, unless there is an attribute defaults, it is never used (it is up to the service/application itself to know what default value should be used). This is the most frequent case. For these cases, the default value is defined in the ACD/XML files more or less as an indication to the users (clients) but it is not really used by Soaplab2.
If a parameter has an attribute defaults (set to true), the default value is really used (e.g. it appears physically on the command-line). This is the case where the application does not have any real default (like the program echo in the HelloWorld example at the beginning of this document) - but a Soaplab2 service provider wants to give some default value. Or a case, where the application has a different default value than the Soaplab2 service provider wants to use (e.g. the graphical program dot - see its ACD file src/etc/acd/sowa/dot.acd).

How to specify semantics

Every application and each of its parameters (especially its inputs and outputs, but also other parameters) can have its meaning explicitly specified by ontology terms. Simply speaking, an ontology is a controlled vocabulary with terms pre-defined and agreed on by domain experts. Additionally to such simple vocabulary, an ontology can have (and usually has) defined relationships between its terms (saying, for example, which terms are more general and which are specific). The ontology terms do not influence how the application is invoked and what it does but can be useful, for example, for discovering which application to use for your particular data types.

In an ACD file, the ontology terms can be specified by one or more relations attribute(s). Such attribute can be used both as an application attribute or as a parameter attribute. There are two possible formats for the value of this attribute (the second one is, however, recommended):

  relations: "/edam/operation/0001813 Sequence retrieval"

  relations: "EDAM:0001813 operation Sequence retrieval"

Both examples above say that this application (or this parameter) has semantics defined by an ontology term identifier EDAM:0001813 and that this term is defined in an ontology namespace operation. Additionally, more or less for convenience (because the same data could be obtained directly from the ontology itself by using the ontology term identifier), there is also an ontology term name Sequence retrieval.

You may use any ontology, of course. But one strong candidate is the EDAM ontology that is closely related to the EMBOSS and, therefore, all EMBOSS applications (ACD files) already are annotated by relations from this ontology.

Building the command-line or request

To build a command-line, or to create an HTTP request with correct parameters, or to prepare some other kind of request (used perhaps by a Soaplab2 plug-in), is obviously an ultimate feature of Soaplab2.

The usual way is to build the command-line from the individual parameters, each of them can be created either as a tagged value:

-<parameter-name> <parameter-value>

or as a non-tagged value (when the parameter attribute parameter is set to true):

<parameter-value>

The first form can be further modified by the tagsepar attribute, resulting, for example, in this:

<parameter-name>=<parameter-value>

If this is not enough, a parameter can have an attribute template that defines how the parameter should be built. The template value is a string that contains special tokens (e.g. "$$" or "&&"). The tokens will be substituted by the real parameter value (sent by a client) or by a parameter name (the latter substitution is not that crucial because you can yourself include the parameter name directly; but it helps with the maintenance of ACD files if you have the parameter name only in one place).

If a parameter does not have any value (a client sent nothing, and there is no default value), nothing is built for this parameter. In other words, the template string is not used at all (even if it has some constant text there).

Admin tool: `ExploreParameters`

The template attribute may be complex, especially when you consider a global template (an application attribute; more about it in a minute). Therefore, there is an admin tool, a command-line client, that can help to test templates without modifying ACD files and converting them in the XML metadata each time you try a new template.

Separation of resulting arguments

A slightly confusing (but quite powerful) concept of the templates is how to cut result string into the pieces. In templates you define exactly how parameters should be separated (into individual command-line arguments) before being sent to an application. An example explains it better:

Let's have a service that has one input named greeting, and a client sends there a value "I wish you luck, mate".

Without templates and with the following ACD parameter:

string: greeting  [
]

we are getting a command line with exactly two elements (it is shown in the format used by the admin tool for exploring parameters, mentioned above):

Command line
------------
        ( 1) -greeting
        ( 2) I wish you luck, mate!

You can see that Soaplab2 took care and made sure that the whole sent value is not separated by white-spaces.

However, if you use a template, you have to tell whether you wish or not to wish to separate by white-spaces. Let's have now the same parameter with a template:

string: greeting  [
   template: "-greeting $$"
]

The token "$$" is replaced by the client value. The result is now, however, different - the command line suddenly gets six elements:

Command line
------------
        ( 1) -greeting
        ( 2) I
        ( 3) wish
        ( 4) you
        ( 5) luck,
        ( 6) mate!

In order to get the same result as without a template, you need to use a slightly different token (a double-quoted $$):

string: greeting  [
   template: '-greeting $"$"'
]

Now, we are getting the same command line with just two elements:

Command line
------------
        ( 1) -greeting
        ( 2) I wish you luck, mate!

Application and parameter attributes

The template can be used both as an application attribute and a parameter attribute. Historically, however, the syntax of these two is slightly different.

A template as an application attribute is specified as:

appl name [
   ...
   comment: "method <string>"
   ...
]

And a template as a parameter attribute is specified as:

data-type name [
   ...
   template: "<string>"
   ...
]

Some tokens can be abbreviated ("$$", "&&", etc.). This abbreviation cannot be used for the application attribute (because the abbreviated form does not tell which parameter is the token meant for).

Parameter ordering

The resulting command-line is created with the parameters in the order as they are define in the ACD file. Unless there is a template as an application attribute that changes the order (by listing parameters in a different order).

Template tokens

Token	Will be replaced by	Explanation
Parameters' values
$id	value of parameter 'id'
${id}	value of parameter 'id'	used when the construct is followed immediately by a letter (like in ${id}A)
$"id"	value of parameter 'id'	value will not be separated into several pieces even if it contains white-spaces
${"id"}	value of parameter 'id'	ditto as above
$$	value of the current element	a simpler notation for $id; cannot be used in application attribute
$"$"	value of the current element	a simpler notation for $"id"; cannot be used in application attribute
Parameters' names
&id	name of parameter 'id'
&{id}	name of parameter 'id'	the same as &id but this protects from being mixed with the subsequent letters
&&	name of the current parameter	a simpler notation for &id; cannot be used in application attribute

In templates, there are more tokens and more features when so-called "repeatable" parameter values are used. Soaplab2 does not yet fully implement them in all places. The table above may be, therefore, extended in the future.

Examples

Here is an example with a template as an application attribute:

appl: MedlineSRS [
  documentation: "Get MEDLINE citation (in XML)"
  groups: "Testing"
  nonemboss: "Y"
  supplier: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
  comment: "method -e+[MEDLINE:'$pmid']+-ascii"
]

string: pmid  [
  parameter: "Y"
]

outfile: result  [
]

The example is from Gowlab: it fetches a publication from bibliographic database Medline via SRS at EBI.

And here is a short example with a template as a parameter attribute:

string: greeting  [
  additional: "Y"
  default: "Hello World"
  comment: "defaults"
  template: "Soaplab2 sends you regards: $$"
]

The result command line will be:

Program and parameters:
/home/senger/soaplab2/run/echo
Soaplab2
sends
you
regards:
Hello
World
--- end of parameters

The same example with a slightly changed template (in order not to separate by white-spaces):

string: greeting  [
  additional: "Y"
  default: "Hello World"
  comment: "defaults"
  template: '"Soaplab2 sends you regards:" $"$"'
]

Now the result command line is:

Program and parameters:
/home/senger/soaplab2/run/echo
Soaplab2 sends you regards:
Hello World
--- end of parameters

Data types and their specific attributes

Each application usually has some inputs and always produces some outputs. The inputs include both real data (outside of the Web services context, they are often referred to as input files) and parameters (or options, or arguments).

In an ACD file, every input and output is described in a separate construct that starts with a data type:

datatype: parameter_name [ 

   parameter attributes

  ]

There are not many data types (EMBOSS applications have much more of them - but we are not creating ACD files for EMBOSS native applications because they already come with the ACD files).

Data types for input real data (like files)

infile

It defines an input data. For the command-line application, it is an input file. Soaplab2 gets data from the user and creates an input file on the server side and informs the application about this file.

Specific attributes for this data type are:

comment: "data direct"

comment: "data filename"

The input can be of two forms: direct input or a reference to an input. Both of these forms are defined by one ACD parameter - and this optional attribute can limit which form should be used.

If this attribute is not specified then both, direct and reference input are acceptable:

infile: input  [
]

In the example above, a client sees (and can use) two input names (both are created from this one ACD parameter):

<parameter-name>_direct_data
<parameter-name>_url

The parameter name is extended by fixed suffixes. The _direct_data means that a user is sending directly data, the _url means that a user is sending a reference to the data. The reference should be a URL where the input data can be fetched from. Usual protocols (http, ftp) are supported.

From historical, legacy reasons there is a naming discrepancy: The names on the user/client site use suffixes '_direct_data' and '_url', but the ACD parameter attribute uses for the same purpose values 'direct' and 'filename'.

Thus, in the example above, the user can send direct data under the name input_direct_data or a URL under the name input_url.

For EMBOSS, the reference data (on the client site) has suffix _usa instead of _url (see what USA in EMBOSS means).

If this attribute has the value direct, only direct data can be sent by a client. In this case, the input name is identical with the parameter name (input in our example), no suffixes are used:

infile: input [
  comment: "data direct"
]

Similarly, if this attribute has the value filename, only reference data can be sent by a client. Again in this case, the input name is identical with the parameter name (input in our example), no suffixes are used:

infile: input [
  comment: "data filename"
]

comment: default_for_direct

For cases when both input forms (direct and reference) are allowed, and if there is an attribute default, it would not be clear what this default value represents. Therefore, here is a role for this boolean attribute default_for_direct. If set to true, the default value is used for direct data, otherwise the default value is used for reference data. For example:

infile: input [
  additional: "Y"
  default: "this is my default"
  comment: default_for_direct
]

There is no way to specify default values for both, direct and reference data, for the same parameter.

default

This attribute was already explained. Just one specific comment for using this attribute within an infile data type: If its value is stdin, it is not used as a default value but it indicates that the underlying application expects this input as its standard input stream.

And yes, you are right: there is no way to specify a default value for the standard input stream.

comment: bindata

A boolean attribute. Its true value specifies that the input data should be treated as binary data (e.g. images). Otherwise the data are considered text data.

comment: "input_adaptor <class-name>"

It allows to extend Soaplab2 by a class that will be called to adapt input data when they arrive to the Soaplab2 server and before they are used by the usual Soaplab2 mechanism. This class should implement interface org.soaplab.services.adaptor.DataAdaptor.

filelist

It specifies that the input should be an array (a list) of input data. Soaplab2 treats all such inputs as binary data. When data arrives to the Soaplab2 server, they are again stored in local files and a new file - with the names of the local files; each file on a separate line - is created and provided to the underlying application. Which means that the application must be able to read such file list. Here is an example of such application (a Perl script):

#!/usr/bin/perl -w
#
# It copies a file (-i) (if given) to STDOUT.
# Then it adds to STDOUT contents of files given by names
# by the -l option.
#
# Usage: copy-files.pl -i <input-file> -l <list-file>
# 
# ---------------------------------------------

use strict;
use warnings;
use File::Copy;
use File::Basename;

use Getopt::Std;
my %opts;
getopt ('il', \%opts);

exit 0 unless $opts{i} or $opts{l};

copy ($opts{i}, \*STDOUT) if $opts{i};
if ($opts{l}) {
    my $dir = dirname ($opts{l});
    $ARGV[0] = $opts{l};
    while (<>) {
	chomp;
        copy ($_, \*STDOUT) or copy ("$dir/$_", \*STDOUT) or warn "Copy of $_ failed: $!\n";
    }
}

And here is a full ACD for the "application" above:

appl: Files [
  documentation: "Copying and merging files to standard output"
  groups: "Testing"
  nonemboss: "Y"
  executable: "copy-files.pl"
]

infile: input  [
  additional: "Y"
  qualifier: "i"
]

filelist: list [
  additional: "Y"
  qualifier: "l"
]

outfile: output  [
  default: "stdout"
]

In the filelist data type, all input data are considered direct data. If you need a list of reference data, you would need to write your own input adaptor (see the comment "input_adaptor <class-name>" above, and look for inspiration in the existing input adaptor org.soaplab.services.adaptor.InputManyFiles - the one actually used to implement the functionality of the filelist data type).

Data types for input arguments

string

The most common data type. It does not have any specific attributes (just the general ones, described above).

boolean

A data type that does not have any real value. Its presence (i.e. when sent by a client) is indicated on the command line as a toggle (an option). It does not have any specific attributes.

integer

A data type representing an integer. Its value is checked if it is a valid integer. It does not have any specific attributes.

float

A data type representing a decimal-point number. Its value is checked if it is a valid integer or a decimal-point number.

A specific attribute for this data type is:

precision: <number>: The number specifies maximum number of decimal points.

Here is an example showing the input data types:

string: text [
  additional: "Y"
  default: "this is a default"
  prompt: "An optional string with a default value"
]

string: text_no_default [
  standard: "Y"
  prompt: "A mandatory string input"
]

integer: number_int  [
  additional: "Y"
  default: 42
]

float: number_float  [
  additional: "Y"
  precision: 2
  default: 30.12
]

boolean: bool_false [
  additional: "Y"
  default: false
]
boolean: bool_true [
  additional: "Y"
  default: true
]
boolean: bool_no_default [
  additional: "Y"
]

You can try a service created from the above - the service name (in the default Soaplab2 distribution) is testing.inputtypes.

list

A data type that allows to list all possible values for a parameter. The result (how this parameter appears on the command line) is quite different depending on which values for attributes minimum and maximum are used:

If minimum is 1 and maximum is also 1, then only one value can be sent by a client. For example, if a value "F" is sent for a list named "menu" (the full ACD is shown below), the created parameter will be:
```
-menu F
```
Otherwise, a client can send more (boolean) values (as many as possible values are specified in the attribute values). The names for these input values are concatenation of the parameter name, underscore and a possible value. The resulting parameter will be a list separated by character defined by the attribute separator. For example, if a client sent two boolean values "format_jpg" and "format_svg", the resulting parameter will be:
```
-format jpg,svg
```

The specific attributes for this data type are:

minimum: <number>

maximum: <number>

A minimal and maximal number of accepted values for this data type. The number should be a positive integer, and the maximum not smaller than the minimum.

Minimum value one is misleading: it seems to mean 'at least' one value is required. But it is not (the mandatorness is defined differently, by other attributes). Also, a missing minimum should have a default value one - but it has not. Consider these irregularities a bug in Soaplab2 that hopefully will be solved in the future.

separator: "<character>"

This attribute is used only when maximum is greater than 1 (or minimum missing - see the bug note above). In this case, when a user sends more values, the resulting command-line parameter uses this attribute to separate sent values. An example was shown above.

Do not confuse it with attributes delimiter and codedelimiter - they both define how to write values in an ACD file but not how the resulting parameter will look like.

values: "<list>"

A mandatory attribute that specifies all possible values for this list. The values are separated by a character defined by the attribute delimiter (default is semicolon). Leading and trailing white-spaces of the individual values are trimmed.

Each individual value can be a bit more complex: it can consist of two values, separated by attribute codedelimiter (default is colon). The first value is the one that matters - one that is accepted from client and that appears in the resulting parameter. The second value is just a better human-readable text for the first value (that may be useful for some client's GUIs).

The XML created from ACD has both values, but the human-readable one is not that easy to access. It may changed in the future but for now the second value is rarely used, even the Soaplab2-native Spinet client does not show it.

If a list has a default attribute, its value should be one of the values from the values attribute (if the values are 'complex', use the first part for the default attribute).

delimiter: "<characters>"

The characters that separates individual values in the values attribute. Default value is a semicolon.

codedelimiter: "<characters>"

The characters that separates two parts of individual values in the values attribute. Default value is a colon.

Here is a full ACD for testing both kinds of lists:

appl: Lists [
  documentation: "How to use lists"
  groups: "Testing"
  nonemboss: "Y"
  executable: "echo"
  comment: defaults
]

list: format  [
  additional: "Y"
  default: "png"
  values: "canon; dot; fig; gd; gif; hpgl; imap; jpg; mif; mp; pcl; pic; plain; png; ps; svg"
  prompt: "Graphical format"
  comment: "separator |"
]

list: menu [
  default: "V"
  minimum: "1"
  maximum: "1"
  values: "F--fungi,I--insect,P--plant,V--vertebrate,O--other,C--Custom"
  delimiter: ","
  codedelimiter: "--"
  prompt: "Transcription Factor Class"
  information: "Select class"
]

outfile: output  [
  additional: "Y"
  default: "stdout"
]

If the service (created form the ACD above) is called (using the command-line client) as:

build/run/run-cmdline-client -name testing.lists -w -r -format_canon -format_fig menu F

the resulting command line will look like:

-format canon|fig -menu F

Data types for outputs

outfile

It defines an output data, a result. Each ACD file should have at least one of this data type.

Each Soaplab2 service always has two special outputs ("report" and "detailed_status") that are not defined in the ACD file, at all. Therefore, it would be less confusing if you do not use these names for your 'outfile' data type.

Specific attributes for this data type are:

default: stdout

default: stderr

Soaplab2 outputs do not have really any default values. Therefore, this attribute is used here (as in infile) to indicate that a standard stream should be used (instead of an output file with its name specified on the command line).

For example, here is an ACD file with all possible streams defined. The application (a Perl script) is distributed with Soaplab2 in file run/all-streams.pl.

	      
appl: Streams [
  documentation: "Filtering stdin into stdout and stderr streams"
  groups: "Testing"
  nonemboss: "Y"
  executable: "all-streams.pl"
]
infile: input  [
  additional: "Y"
  default: "stdin"
]
outfile: std_output  [
  additional: "Y"
  default: "stdout"
]
outfile: std_errors  [
  additional: "Y"
  default: "stderr"
]

comment: bindata

A boolean attribute. Its true value specifies that the output data are binary data (e.g. images). Otherwise the data are considered text data.

extension

It defines a "file extension" for this output. The Web services do not use really files but data streams. The file extension is, however, a valuable source of information about the data in a particular output. The extension is, for example, useful for results that are shown directly in a web browser (like it is in the Spinet client).

comment: "mimetype <value>"

It defines a MIME type of this output. This plays similar role as the extension above: it tells us something about syntax or semantics of an output.

comment: make_url

A boolean attribute whose default value is "yes" (true). Unless set to "no" (false) a reference (a URL) of this output will be created, as well. It allows to pass results by reference back to the clients. Because Soaplab2 allows sending the input data as a reference, too, these URLs can be conveniently combined if more services are executed in a chain, passing data from one service to another.

You can influence where these "URL results" will be served from by setting several run-time configuration properties (look for properties results.url.ignore, results.url.target.dir and results.url in the configuration guide).

comment: "output_type String[]"

comment: "output_type byte[][]"

Soaplab2 recognizes four types of output: a string and a binary type, both of them either as a single item, or an array of items. Using the attribute output_type, you can define the type of output: the square brackets symbolize an array, byte indicates binary type. The string type is a default type (so no need to use this attribute). The individual binary output is usually specified (historically) rather by the attribute bindata.

Once you specify an output type, you should make sure that the underlying application produces such type. Soaplab2 itself treats data as binary data if the ACD file says so - but it does not create an array of data (except for graphical EMBOSS programs that produce images on several pages). If you want them you need to create your own plug-in. An example of such plug-in (doing nothing useful) is the class class org.soaplab.samples.OutputTypesJob. And here is a complete ACD file for it:

appl: AllOutputTypes [
  documentation: "Showing how a plugin can create all kinds of outputs"
  groups: "Plugins,Testing"
  nonemboss: "Y"
  comment: "class org.soaplab.samples.OutputTypesJobFactory"
]

infile: input  [
  standard: "Y"
  help: "This input will be copied to several outputs. <p>
         For some of outputs, it will be even replicated
         (how many times, it depends on the parameter <em>count</em>)."
]

integer: count [
  additional: "Y"
  default: "3"
  prompt: "How many times to replicate input in the array outputs"
  comment: defaults
]

outfile: simple_text_output  [
]
outfile: simple_binary_output  [
  comment: bindata
]
outfile: array_text_output  [
  comment: "output_type String[]"
]
outfile: array_binary_output  [
  comment: bindata
  comment: "output_type byte[][]"
]

comment: "output_adaptor <class-name>"

It allows to extend Soaplab2 by a class that will be called to adapt result data just before they are sent to the Soaplab2 clients. This class should implement interface org.soaplab.services.adaptor.DataAdaptor. A testing example is in output_adaptor org.soaplab.services.testing.TestingDataAdaptor - it just cuts the results into individual words. Its ACD would look like this:

appl: Results [
  documentation: "Testing an output adaptor"
  groups: "Testing"
  nonemboss: "Y"
  executable: "echo"
]

string: param [
  parameter: "Y"
  default: "this is a result"
  comment: defaults
]

outfile: output [
  additional: "Y"
  default: "stdout"
  comment: "output_adaptor org.soaplab.services.testing.TestingDataAdaptor"
]

(Some) known issues

The main issue that should be supported in metadata definition (a feature introduced long time ago in EMBOSS) is to be able to refer to other parameters' values.
For example, a parameter attribute extension for an output may depend on the value (sent by a client in the run-time) of an input parameter format. It would be nice to specify in the ACD file:
```
outfile: output [
   extension: ${format}
]
```
More examples and explanation should be done for so-called repeatable parameters. Also some of their features should be first implemented.

Building metadata

The previous chapter described how to create ACD files in order to describe parameters and (up to certain point) behaviour of applications (or resources) that will be accessed as Soaplab2 Web Services. What you need to know now is:

where to create ACD files,
how to convert them into XML files, and
how to tell Soaplab2 about these XML files.

If you are a client developer, you may wish to know also how the clients can access service metadata and how to benefit from them.

In any case, because this chapter is about the building, it is recommended to look first into the build guide what is the Ant tool, how to use it, and what are the built-time properties.

Where to create metadata

The easiest place is to create your new ACD files in the same place where Soaplab2 distribution already has its own ones - in the src/etc/acd/<directory>. The <directory> depends on what application (resource) type your ACD file is going to describe. Soaplab2 distribution comes with the following directories:

Directory src/etc/acd/sowa is used for command-line applications. Which is often the main resource of Soaplab2 Web services. Use this if you are going to wrap existing command-line tools, or if you plan to have Soaplab2 Web services on top of your own scripts.
This is what was covered in Soaplab1 by the old AppLab server. By the way, the name "sowa" means "SOaplab Without Applab".
Directory src/etc/acd/gowlab is used for Soaplab2 Web services on top of various web pages. Project Gowlab has more details.
Directory src/etc/acd/ebi is used for ACD files that define how to access several Web services at the EBI. It is not likely that you are going to add your ACD files here - consider this directory rather like examples how to write a Soaplab2 plug-in.
Directory src/etc/acd/test is used for testing purposes.

If you are wondering where are the ACD files for EMBOSS, they are not part of the Soaplab2 distribution but part of the EMBOSS itself. More about using them is in the EMBOSS notes.

Name your ACD files with names without spaces and other strange characters (that's because the file name become also part of the resulting Web service name).

Or, create your ACD files in your own directory and use ...acd.dir built-time properties - as shown in the next section.

How to convert ACD to XML files

This chapter will be changed after Soaplab2 gets a new (pure Java) ACD to XML converter. But the basic principles (such the used built-time properties described below) will likely stay the same.

Other inputs than ACD files

There are few attributes that can be included in the resulting XML metadata files without being in any particular ACD file. These sources are meant to be shared (used) by all applications (or at least by a group of applications).

They are in the converter's configuration file src/etc/config/generator.config.template. Every time a converter is invoked, this file is copied into metadata/old.generator/al.Cfg.pl. During this time, Ant will substitute some data there from the following built-time properties:

installation.name

Attribute defining where your Soaplab2 is run (or installed). Default value is Soaplab2 default installation.

gen.supplier

Attribute defining who supplies applications (or services). Default value is an empty string.

gen.version

Attribute defining version of (all) applications. Default value is an empty string.

gen.help.base.url

In ACD files, you can specify (as an application attribute) a URL with a detailed help page for the application. For a group of applications of a similar provenience (like all EMBOSS applications), it is easier to have a more automated way how to define these URLs. You can put into this property a base URL, and the converter adds there individual application names.

gen.disabled.apps

Sometimes it is convenient to not generate XML files for some applications - even if there are ACD files for them. You can list such disabled ACD file names (just base names, without file extensions) in this property in a comma-separated list. An example of such list may be seen in xmls/emboss.xml file.

This property can be used together with the built-time property sa (explained in the next section). Difference between these two is that property sa specifies what to include and gen.disabled.apps specifies what to exclude.

gen.job.class

This is an equivalent of the application attribute class. If you set it here, you do not need to replicate it in all ACD files belonging to the same type of services (to the same plug-in).

The same configuration file contains also a name of a directory where the XML files will be generated. There is, however, no property to change it. The target directory is metadata/generated.

Ant tasks: gen...

To convert all ACD files (from the sowa, gowlab and ebi directories shown above) to XML files, use the Ant's task gen:

ant gen

By the way, the task gen is also called when you install Soaplab2 (from the task install).

To convert only ACD files from some of those directories, use individual tasks:

ant gensowa
ant gengowlab
ant genebi
ant gentest

All the gen tasks create metadata in directory metadata/generated. For each category there will be a sub-directory.

The gen... tasks can by customized by setting some built-time properties:

sa

Use this property, as a space-separated list of ACD file names (without file extensions), to specify which ACD files should be converted. By default, all ACD files from a given directory are processed (except the list given in the gen.disabled.apps - see above). For example:

ant "-Dsa=helloworld dot sleep" gensowa

Be aware that the converter also creates an application list. If property sa is given, such list will contain only applications given in this property. And it forgets remaining applications - which may be fine for your testing period but perhaps at the end you wish to have all applications available as your Soaplab2 services. Therefore, run the gen... task again, or use property sl (see below) to create a list with a new name.

The sa property should not be used with the general gen task.

sl

sowa.sl

gowlab.sl

ebi.sl

test.sl

A property defining a file name that will be used for a list of applications (services). This is the list that was mentioned at the very beginning of this document. The Soaplab2 will learn about this list from the run-time property applist - see about it the configuration guide (there may be, and usually are, more such list files).

The property sl should be used only with specific gen tasks (e.g. gensowa). If you use it with a general gen task, the list will include only the last part - that is not what you want. Therefore, do not do this:

ant -Dsl=MyApps.xml gen

The sl is just a shorter form. These two examples are identical:

ant -Dsl=MySowaApps.xml gensowa
ant -Dsowa.sl=MySowaApps.xml gensowa

With the general gen task, use the not-shortened properties:

ant -Dsowa.sl=MySApps.xml -Dgowlab.sl=MyGApps.xml -Debi.sl=MyEApps.xml gen

Default names are:

for sowa.sl: OtherApplications.xml
for gowlab.sl: GowlabApplications.xml
for ebi.sl: EBIApplications.xml
for test.sl: TUnitApplications.xml

acd.dir

sowa.acd.dir

gowlab.acd.dir

ebi.acd.sl

test.acd.sl

Finally, properties you are probably waiting for: they define directory names where the ACD files are taken from. Again (as above), use the shortened form acd.dir only with the more specific gen...tasks.

Typical example: Create your ACD files in a directory myacd (or copy there some existing ACDs from the Soaplab2 distribution - as I did for this documentation). Then run:

ant -Dacd.dir=myacd -Dsl=MyApps.xml gensowa

You may see a similar report to this one:

_gen:
     [copy] Copying 1 file to /home/senger/soaplab2/metadata/old.generator
     [echo] /home/senger/soaplab2/metadata/old.generator/acd2xml -d -l MyApps.xml -p . -r myacd dot helloworld
  [acd2xml] Processing dot...
  [acd2xml]     using myacd/dot.acd
  [acd2xml]     (generated into module graphics)
  [acd2xml]     Created: /home/senger/soaplab2/metadata/generated/graphics/dot_al.xml
  [acd2xml] Processing helloworld...
  [acd2xml]     using myacd/helloworld.acd
  [acd2xml]     (generated into module classic)
  [acd2xml]     Created: /home/senger/soaplab2/metadata/generated/classic/helloworld_al.xml
  [acd2xml] Created: /home/senger/soaplab2/metadata/generated/MyApps.xml

BUILD SUCCESSFUL
Total time: 4 seconds

How to tell Soaplab about XML files

This is more a run-time configuration topic - you should see, therefore, more details in the configuration guide.

Once the ACD to XML converter has created metadata files, you have to put the names of the application list files (not the names of the individual service metadata files) in the run-time configuration file as one or more property applist. A typical example is:

base.dir = /home/senger/soaplab2
metadata.dir = ${base.dir}/metadata/generated

applist = ${metadata.dir}/OtherApplications.xml
applist = ${metadata.dir}/GowlabApplications.xml
applist = ${metadata.dir}/EBIApplications.xml
applist = ${metadata.dir}/EMBOSSApplications.xml

Last modified: Fri Aug 6 11:38:39 2010