EMBOSS Notes
Soaplab's main goal, from the very beginning, was to make Web Services
especially on top of EMBOSS. These notes contain
few details how to make it happen.
Soaplab2's plug-in for EMBOSS
EMBOSS, as a set of command-line tools, is a typical target of
Soaplab. But it also requires some special treatment (e.g. collecting
resulting images, or dealing with an EMBOSS-specific sequence
references - the famous USA). These
specifics are implemented as a Soaplab's plug-in (its main author is
Mahmut Uludag from the EBI).
The plug-in code is in the package org.soaplab.emboss and it
is distributed together with the main Soaplab2 release files. But this
is just a plug-in - you need to have on your computer (the one where
Soaplab2 Web services will be running within Tomcat servlet container)
EMBOSS itself.
EMBOSS installation
It is obviously beyond this document to go to details (and anyway, the
EMBOSS documentation and support are quite exquisite). Let us just
list the basic steps:
- Download EMBOSS from EMBOSS homepage and
unpack it. Go to the created directory.
- Type: sh ./configure --prefix=<where-to-install-emboss>
- Then: make
- Then: make check
- Then: make install
- EMBOSS needs a list of databases. A default list
is in
<emboss-home>share/EMBOSS/emboss.default. Consider to
edit it. Here is another example of a list of
databases, as used during developing Soaplab2 at EBI.
The file can be either left in
<emboss-home>share/EMBOSS/emboss.default, or put in the
home directory under the name .embossrc. But it has to be a
home directory of the user who is running Tomcat server.
- Extract data for few EMBOSS databases: Go to the
directory where you have installed EMBOSS and type there:
bin/tfextract share/EMBOSS/test/data/site.dat
bin/printsextract share/EMBOSS/test/data/site.dat
bin/prosextract -prositedir share/EMBOSS/test/data
bin/rebaseextract ./share/EMBOSS/test/data/withrefm ${emboss_home_src}/test/data/proto
above ${emboss_home_src} is the root directory where you unpacked the emboss distribution file
For installing EMBASSY applications follow steps 1 to 5 for each
EMBASSY package you want to install. Using the same prefix in step 2
lets EMBASSY applications share the same directory structure EMBOSS
applications use so EMBASSY applications doesn't need speacial treatment.
There are also few EMBOSS applications that need to have installed
third-party programs (that are not part of the EMBOSS
distribution). For example, the emma needs
clustalw. The other one in this category is the EMBOSS tool
eprimer3. If you do not have them installed, you will be
getting errors from the Soaplab2 services. Similar to this:
Standard error stream:
Died: The program 'primer3_core' must be on the path.
It is part of the 'primer3' package, version 1.1,
available from the Whitehead Institute.
See: http://primer3.sourceforge.net/
or this one:
Standard error stream:
EMBOSS An error in ajsys.c at line 988:
cannot find program 'clustalw'
Build Soaplab2 metadata for EMBOSS
As any Soaplab service, the EMBOSS services need also ACD files. But
they already have them. You only need to convert them into Soaplab's
XML metadata files:
First, you have to tell where is your EMBOSS installed. Either set an
environment variable EMBOSS_HOME, or an Ant's build-time property
emboss.home to point there.
You can also set few other properties. They will be propagated into
generated XML metadata - they are the built-time
properties. Which means that you should set them either on the
Ant's command line (when running ant genemboss, or
(preferable) in your build.properties):
- emboss.supplier
- Any string indicating who is running EMBOSS Web
services. Default value is an empty string.
- emboss.version
- Any string indicating your EMBOSS version. Default value is an
empty string.
For example:
emboss.home = /home/senger/Software/emboss
emboss.supplier = My local laptop
emboss.version = 6.1.0
Then, consider to disable some EMBOSS applications (those that are
meant more or less only for testing or administrative tasks). The list
of already disabled applications is available in
xmls/emboss.xml. Change it (by setting the property
gen.disabled.apps) only if you have some special reasons.
Finally, generate Soaplab's metadata (do not worry about few warnings
Duplicated input name (graph_format)...):
ant genemboss
This task creates also the file
metadata/generated/EMBOSSApplications.xml. Add its name to
the Soaplab configuration file as a property applist:
applist = ${metadata.dir}/EMBOSSApplications.xml
You can do it directly in the template
src/etc/config/soaplab.properties.template but better way is
to use built-time property my.soaplab.properties (see more
about the configuration in the configuration guide).
Try now:
ant clean
ant info-list
Do you see all EMBOSS services?
Running "ant clean" is not usually needed - but it
does not harm. It makes sure that the run-time configuration file,
created in build/classes/soaplab.properties is really
updated.
Run-time environment for EMBOSS
The run-time properties must be put in the
soaplab.properties file. There are just two mandatory ones,
and few optional ones:
- emboss.home
- We have been already using this property to find EMBOSS
installation directory in the built-time. Of course, the
run-time needs the same knowledge.
Juts to repeat what is the difference: The built-time
properties should be given either on the command line when the
ant is being invoked, or (better) put into
build.properties file. The run-time properties must
be put into soaplab.properties file (which ends up in the
classes directory).
- emboss.data
- This property is needed only if you use a non-standard EMBOSS
installation. By default, the EMBOSS data directory is
<emboss-home>/shared/EMBOSS/data. Soaplab2 can find
this directory simply from the emboss.home property. Only if
the data directory is elsewhere, use
emboss.data.
- emboss.path
- The same situation as above - but with the EMBOSS bin
directory. By default, EMBOSS binaries are in
<emboss-home>/bin where Soaplab2 can find them. Only if
you have installed them elsewhere, set this property.
- applist
- This is a mandatory property - but already described in the
previous section. It keeps a full path to the metadata with the list
of EMBOSS services.
- env.LD_LIBRARY_PATH
- This property sets an environment variable LD_LIBRARY_PATH
before any EMBOSS tool is started. You need it only if your EMBOSS
binaries have dependencies to shared libraries which are not located in
the same folder(s) used during EMBOSS installation. For example, for
systems not running recent versions of Linux, you'll have to add a
directory containing libgd.so.2 to LD_LIBRARY_PATH, due to the
requirement for GD 2, in EMBOSS 5.0.
- env.PATH
- This property sets an environment variable PATH for
any EMBOSS tool. It is not used to find the EMBOSS binaries themselves
(Java does not do that) but to allow EMBOSS binaries to locate their
own third-party applications. A typical example is the EMBOSS tool
emma that needs to find clustalw program. For example:
env.PATH = /usr/local/bin:/usr/bin:/bin:/dir2/clustalw:/dir3/hmmer-2.3.2/binaries
How to test EMBOSS services
The Soaplab2 is equipped with ability to run more/many services
(sequentially or in parallel) with pre-defined sets of input data as a
testing batch. The way how to run it is described in the batch-test client document. But
because this ability is particularly useful for EMBOSS (being a
package with hundreds of programs) here is a brief overview how to run
this batch test for EMBOSS services:
- The batch test for EMBOSS requires to set
emboss.home property (see above how to do it).
- Another property to be set is
batch.test.file. Here is how it should be set for both,
EMBOSS and EMBASSY:
test.data.dir = ${base.dir}/data
batch.test.file = ${base.dir}/data/embossTests.cfg
batch.test.file = ${base.dir}/data/embassyTests.cfg
As you see, you can use the same property several time, to include
more testing definitions. The base.dir property, used in the
example above, should point to the Soaplab2 main directory. If you
have the testing configuration files elsewhere, set
test.data.dir differently.
- And start the batch test:
build/run/run-batch-client
In my case, I have not installed clustalw nor
primer3_core (that's why I am getting two errors) and I have
not installed EMBASSY package either (that's why I am getting 61 not
available):
Summary
-------
Successfully: 150
Erroneously: 2
Not available: 61
If you are getting more errors, it may be that your EMBOSS databases
do not contain the same sequences as defined in the testing data
sets. Check please data/*Test.cfg files.
Last modified: Mon Feb 4 18:17:17 2008