Friday, 24 October 2008

XML ingest

This post wil be about the XML ingest method of the ICAT Api and the tools we are using to ingest the data from the experiments automatically.

The ICAT Api has a method called ingestMetadata which allow you in one call to define and set up a new investigation/Dataset with all the related parameters. This method takes as parameter a sessionId and an XML string.

Th3e XML string must conform to a specific Schema that define what parametrers are necessary and what formats are acceptable. The schema follow the same structure than the ICAT database if sometimes in a simplify manner. not all possible relationship are described. The values of the Lookup tables can also be found in the Schema.

The schema can be found in the ICAT Api source code under the icat3-jaxb module under /src/uk/icat3/jaxb/icatXSD.xsd.

To create the XML document necessary, we have developed several tool for the differents formats used at STFC facilites.
  • The first one, writeRaw, is specific at ISIS as he reads only the ISIS RAW format.
  • The second one, nxingest, is more generic as he reads NeXus format files and depend on a mapping file to find the information necessary.
nxingest is distributed alongside the NeXus API library, starting with napi 4.2. it has been run on linux and windows with very slight adaptation ( the includion in napi should ease the installation process.)

nxingest creates the XML document but do not sent it to ICAT. you need to have an 0ther tool to invoque the ICAT API.


Here is a extract from the help file nxingest.txt.

USAGE

nxingest mapping_file nexus_file [output_file]

DESCRIPTION

nxingest extract the metadata from a NeXus file to create an XML fileaccording to a mapping file. nxingest has also some reformating capabilities like date modification, use the currrent time, merging several field together, spliting sentences into keywords, ...

The mapping file will defines the structure (names and hierarchy) andcontent (from the NeXus file, from the mapping file or from the current time)of the oputput file. See below for a description of the maping file.

This tool use the NeXus api so any of the supported format (HDF4, HDF5and XML) can be read.

To be accepted by ICAT, the output XML should match the ICAT3 XML schema

MAPPING FILE SYNTAX

XML Nodes

the structure of the output file will be determine by the nodes of the mapping file.
There are several types of node :
  1. 'tbl' or Table node that define the hierarchy of the output document.
    e.g. the mapping : {icat type="tbl"}{study type="tbl"}{investigation type="tbl" trusted="false"}
    is mapped into : {icat }{study}{investigation trusted="false"}
    NB : XML snippet is not accepted in this blog so the character lt and gt are replaced by { and }
  2. 'user_tbl' User Table node is a specific case where the node is scan several time according to the number of {NXuser} type classes are present in the neXus file. at each iteration, nxingest will replace the string {NXuser} by the correct name found in the file.
  3. 'tag' or record node which define a simple metadata record. It has 2 child node that contain the name of the output element and the source of the element.
    e.g. the mapping : {record} {icat_name}name{/icat_name} {value type="nexus"} path_to_metadata {/value} {/record}
    is mapped into : metadata_from_nexus_file
  4. 'param_str' and 'param_num' or Parameters node define an element of the Parameters table (Dataset, DataFile or Sample).
    e.g. the mapping : {parameter type="param_str"} {icat_name} name{/icat_name} {value type="nexus"} path_to_metadata {/value} {description type="fix"} fixed metadata description{/description} {/parameter}
    is mapped into : {parameter} {name}name {/name} {string_value}metadata_from_nexus_file {/string_value} {units}N/A{/units} {description}fixed metadata description{/description} {/parameter}
  5. 'keyword_tag' will split the source in its various word and fill the keyword table in the ICAT DB. The mapping with two neXus dataset is like : {keyword type="keyword_tag"} {icat_name} name {/icat_name} {value type="mix"} nexus:/{NXentry}/title | fix: , | nexus:/{NXentry}/notes {/value} {/keyword}

Metadata Sources

The source of the metadata is defined by nodes of type 'fix', 'nexus',
'special' and 'mix'. if the type is special. the begining of the text will
contain a modifier (fix:, nexus:, time: or sys: ) The value is then the text
without the modifier.

  1. 'fix' string from the mapping file itself.
  2. 'nexus' metadata is read from the neXus file according to the path
  3. 'special'
    • 'fix:' idem as above
    • 'nexus:' idem as above
    • 'time:' the time can be expressed in multiple format, so the the value after the
      modifier will be composed in 3 parts :
      time:source ; input_format; output format
      • The source can be 'now' for the current time or 'nexus()' with the path to the time string between the parenthesis.
      • input and output format are optional. The s/w expect an integer. Currently the possible values are :
      1. '2007-05-23T12:48:05' (default)
      2. '2007-05-23 12:48:05'
      3. '2007-05-23'
      4. '12:48:05'
      5. '20070523'
      6. '200705'
      7. '2007'
      8. '23/05/2007'
    • system:
      1. sys:filename gives the filename of the NeXus file.
      2. sys:location gives the path of the NeXus file.
      3. sys:size gives the size in bytes of the NeXus file.

  4. 'mix' To combine several sources, several modifiers used with node type 'special' are used separated with '|'.
    e.g. nexus:/{NXentry}/{NXinstrument}.short_name | fix:_ | time:now ; 0 ; 5

NeXus syntax.

NeXus Data is divided in different classes that hold data sets. The datasets may hold any type of data from a single byte to unlimited dimensionarrays. The data sets and the classes may also have attributes.

To collect data from a neXus file, you have to build the path to the data you want.

  • Dataset (singular string or number)
    The path is the name of the different classes separated by '/' the last nameis the name of the dataset.
    e.g. /run/title
  • Attribute (singular string or number)
    The attribute name is separated from the dataset by a '.'
    e.g. /run/data.units
  • Arrays
    Most of the data will be stored as multi dimensional arrays. We may want toextract particular information from the data.
  • Specific value from an array
    A null or positive number between square brackets after the data setname. nxingest consider all dataset an uni-dimension.
    e.g. /run/data_array[3]
  • Derived value
    nxingest may derived a few value from an array. To express that, you have to put the name of the derived parameter between square brackets. Available values are :
    • [AVG] Average
    • [STD] Standard Deviation
    • [MIN] Minimum Value
    • [MAX] Maximum Value
    • [SUM] Sum of all values
    e.g. /run/sample/temperature_log/value[AVG]
  • Generic classes
    NeXus defined generic classes type that user can name freely. nxingest can use some of these to generalise the mapping files for similar instrument.
    By Writing the class type under rounded brackets like {NXentry} the program will substitue it with the actual class name from the current file.
    This is currenlty only available for {NXentry}, {NXinstrument} and {NXuser}
    e.g. /{NXentry}/{NXinstrument}/source/name is equivalent to /run/MUSR/source/name and /entry_0/I18/source/name

    Also there may be more than 1 user define in a NeXus file. nxingest will loop over each of them if the mapping include a special node 'user_tbl'.

Tuesday, 21 October 2008

Oracle Express Edition (XE)

Currently, ICAT requires you to use an Oracle database due to various operations that are implemented as triggers.

If you do not have access to a beefed up Oracle RDBMS then Oracle Database 10g Express Edition (Oracle Database XE) is an excellent alternative. .

XE is an entry-level, small-footprint database based on the Oracle Database 10g Release 2 code base that's free to develop, deploy, and distribute and simple to administer.


Oracle XE uses an embedded http listener that comes with the XML DB (XDB) to serve http requests. The default port for HTTP access is 8080. This may conflict with the default settings of your glassfish server if you do an ICAT installation.

You can determine the current configuration using the following commands when you connect to XE as the oracle user SYSTEM (or any other DBA):



You can change the http port and the ftp port to whatever you like (keep in mind that you need special privileges for ports < 1024 on Unix/Linux systems).



Hopefully, this will spare you some of the pain that I had to go through!

Damian

Monday, 6 October 2008

Welcome to the ICAT Developer blog

We are going to use this blog to publish official updates, ICAT tips and tricks, and notify the ICAT developer community of new releases. The easiest way to be notified of changes to ICAT is to subscribe to our Atom feed.

If you haven't already, try our demo site. This ICAT instance can be used to search all ISIS neutron and muon experimental data. The STFC Data Portal is currently being used as the user interface to ICAT (please note, an STFC federal id is needed to log onto this system - contact Damian Flannery if you would like an account created)

Or you can download the ICAT and install your own instance.

Please let us know what you think.

Happy coding!