EMBOSS Notes

Soaplab's main goal, from the very beginning, was to make Web Services especially on top of EMBOSS. These notes contain few details how to make it happen.


Soaplab2's plug-in for EMBOSS
EMBOSS installation
Build Soaplab2 metadata for EMBOSS
Run-time environment for EMBOSS
How to test EMBOSS services

Soaplab2's plug-in for EMBOSS

EMBOSS, as a set of command-line tools, is a typical target of Soaplab. But it also requires some special treatment (e.g. collecting resulting images, or dealing with an EMBOSS-specific sequence references - the famous USA). These specifics are implemented as a Soaplab's plug-in (its main author is Mahmut Uludag from the EBI).

The plug-in code is in the package org.soaplab.emboss and it is distributed together with the main Soaplab2 release files. But this is just a plug-in - you need to have on your computer (the one where Soaplab2 Web services will be running within Tomcat servlet container) EMBOSS itself.

EMBOSS installation

It is obviously beyond this document to go to details (and anyway, the EMBOSS documentation and support are quite exquisite). Let us just list the basic steps:

  1. Download EMBOSS from EMBOSS homepage and unpack it. Go to the created directory.

  2. Type: sh ./configure --prefix=<where-to-install-emboss>

  3. Then: make

  4. Then: make check

  5. Then: make install

  6. EMBOSS needs a list of databases. A default list is in <emboss-home>share/EMBOSS/emboss.default. Consider to edit it. Here is another example of a list of databases, as used during developing Soaplab2 at EBI.

    The file can be either left in <emboss-home>share/EMBOSS/emboss.default, or put in the home directory under the name .embossrc. But it has to be a home directory of the user who is running Tomcat server.

  7. Extract data for few EMBOSS databases: Go to the directory where you have installed EMBOSS and type there:
    bin/tfextract share/EMBOSS/test/data/site.dat
    bin/printsextract share/EMBOSS/test/data/site.dat
    bin/prosextract -prositedir share/EMBOSS/test/data
    bin/rebaseextract ./share/EMBOSS/test/data/withrefm ${emboss_home_src}/test/data/proto
    
    above ${emboss_home_src} is the root directory where you unpacked the emboss distribution file

For installing EMBASSY applications follow steps 1 to 5 for each EMBASSY package you want to install. Using the same prefix in step 2 lets EMBASSY applications share the same directory structure EMBOSS applications use so EMBASSY applications doesn't need speacial treatment.

There are also few EMBOSS applications that need to have installed third-party programs (that are not part of the EMBOSS distribution). For example, the emma needs clustalw. The other one in this category is the EMBOSS tool eprimer3. If you do not have them installed, you will be getting errors from the Soaplab2 services. Similar to this:
Standard error stream:
Died: The program 'primer3_core' must be on the path.
It is part of the 'primer3' package, version 1.1,
available from the Whitehead Institute.
See: http://primer3.sourceforge.net/
or this one:
Standard error stream:
   EMBOSS An error in ajsys.c at line 988:
cannot find program 'clustalw'

Build Soaplab2 metadata for EMBOSS

As any Soaplab service, the EMBOSS services need also ACD files. But they already have them. You only need to convert them into Soaplab's XML metadata files:

First, you have to tell where is your EMBOSS installed. Either set an environment variable EMBOSS_HOME, or an Ant's build-time property emboss.home to point there.

You can also set few other properties. They will be propagated into generated XML metadata - they are the built-time properties. Which means that you should set them either on the Ant's command line (when running ant genemboss, or (preferable) in your build.properties):

emboss.supplier
Any string indicating who is running EMBOSS Web services. Default value is an empty string.

emboss.version
Any string indicating your EMBOSS version. Default value is an empty string.

For example:

emboss.home = /home/senger/Software/emboss
emboss.supplier = My local laptop
emboss.version = 6.1.0

Then, consider to disable some EMBOSS applications (those that are meant more or less only for testing or administrative tasks). The list of already disabled applications is available in xmls/emboss.xml. Change it (by setting the property gen.disabled.apps) only if you have some special reasons.

Finally, generate Soaplab's metadata (do not worry about few warnings Duplicated input name (graph_format)...):

ant genemboss
This task creates also the file metadata/generated/EMBOSSApplications.xml. Add its name to the Soaplab configuration file as a property applist:
applist = ${metadata.dir}/EMBOSSApplications.xml
You can do it directly in the template src/etc/config/soaplab.properties.template but better way is to use built-time property my.soaplab.properties (see more about the configuration in the configuration guide).

Try now:

ant clean
ant info-list
Do you see all EMBOSS services?

Running "ant clean" is not usually needed - but it does not harm. It makes sure that the run-time configuration file, created in build/classes/soaplab.properties is really updated.

Run-time environment for EMBOSS

The run-time properties must be put in the soaplab.properties file. There are just two mandatory ones, and few optional ones:
emboss.home
We have been already using this property to find EMBOSS installation directory in the built-time. Of course, the run-time needs the same knowledge.

Juts to repeat what is the difference: The built-time properties should be given either on the command line when the ant is being invoked, or (better) put into build.properties file. The run-time properties must be put into soaplab.properties file (which ends up in the classes directory).

emboss.data
This property is needed only if you use a non-standard EMBOSS installation. By default, the EMBOSS data directory is <emboss-home>/shared/EMBOSS/data. Soaplab2 can find this directory simply from the emboss.home property. Only if the data directory is elsewhere, use emboss.data.

emboss.path
The same situation as above - but with the EMBOSS bin directory. By default, EMBOSS binaries are in <emboss-home>/bin where Soaplab2 can find them. Only if you have installed them elsewhere, set this property.

applist
This is a mandatory property - but already described in the previous section. It keeps a full path to the metadata with the list of EMBOSS services.

env.LD_LIBRARY_PATH
This property sets an environment variable LD_LIBRARY_PATH before any EMBOSS tool is started. You need it only if your EMBOSS binaries have dependencies to shared libraries which are not located in the same folder(s) used during EMBOSS installation. For example, for systems not running recent versions of Linux, you'll have to add a directory containing libgd.so.2 to LD_LIBRARY_PATH, due to the requirement for GD 2, in EMBOSS 5.0.

env.PATH
This property sets an environment variable PATH for any EMBOSS tool. It is not used to find the EMBOSS binaries themselves (Java does not do that) but to allow EMBOSS binaries to locate their own third-party applications. A typical example is the EMBOSS tool emma that needs to find clustalw program. For example:
env.PATH = /usr/local/bin:/usr/bin:/bin:/dir2/clustalw:/dir3/hmmer-2.3.2/binaries

How to test EMBOSS services

The Soaplab2 is equipped with ability to run more/many services (sequentially or in parallel) with pre-defined sets of input data as a testing batch. The way how to run it is described in the batch-test client document. But because this ability is particularly useful for EMBOSS (being a package with hundreds of programs) here is a brief overview how to run this batch test for EMBOSS services:

If you are getting more errors, it may be that your EMBOSS databases do not contain the same sequences as defined in the testing data sets. Check please data/*Test.cfg files.

Last modified: Mon Feb 4 18:17:17 2008