ICAT: 2008

This post wil be about the XML ingest method of the ICAT Api and the tools we are using to ingest the data from the experiments automatically.

The ICAT Api has a method called ingestMetadata which allow you in one call to define and set up a new investigation/Dataset with all the related parameters. This method takes as parameter a sessionId and an XML string.

Th3e XML string must conform to a specific Schema that define what parametrers are necessary and what formats are acceptable. The schema follow the same structure than the ICAT database if sometimes in a simplify manner. not all possible relationship are described. The values of the Lookup tables can also be found in the Schema.

The schema can be found in the ICAT Api source code under the icat3-jaxb module under /src/uk/icat3/jaxb/icatXSD.xsd.

To create the XML document necessary, we have developed several tool for the differents formats used at STFC facilites.

The first one, writeRaw, is specific at ISIS as he reads only the ISIS RAW format.
The second one, nxingest, is more generic as he reads NeXus format files and depend on a mapping file to find the information necessary.

nxingest is distributed alongside the NeXus API library, starting with napi 4.2. it has been run on linux and windows with very slight adaptation ( the includion in napi should ease the installation process.)

nxingest creates the XML document but do not sent it to ICAT. you need to have an 0ther tool to invoque the ICAT API.

Here is a extract from the help file nxingest.txt.

USAGE

nxingest mapping_file nexus_file [output_file]

DESCRIPTION

nxingest extract the metadata from a NeXus file to create an XML fileaccording to a mapping file. nxingest has also some reformating capabilities like date modification, use the currrent time, merging several field together, spliting sentences into keywords, ...

The mapping file will defines the structure (names and hierarchy) andcontent (from the NeXus file, from the mapping file or from the current time)of the oputput file. See below for a description of the maping file.

This tool use the NeXus api so any of the supported format (HDF4, HDF5and XML) can be read.

To be accepted by ICAT, the output XML should match the ICAT3 XML schema

MAPPING FILE SYNTAX

XML Nodes

the structure of the output file will be determine by the nodes of the mapping file.
There are several types of node :

'tbl' or Table node that define the hierarchy of the output document.
e.g. the mapping : {icat type="tbl"}{study type="tbl"}{investigation type="tbl" trusted="false"}
is mapped into : {icat }{study}{investigation trusted="false"}
NB : XML snippet is not accepted in this blog so the character lt and gt are replaced by { and }
'user_tbl' User Table node is a specific case where the node is scan several time according to the number of {NXuser} type classes are present in the neXus file. at each iteration, nxingest will replace the string {NXuser} by the correct name found in the file.
'tag' or record node which define a simple metadata record. It has 2 child node that contain the name of the output element and the source of the element.
e.g. the mapping : {record} {icat_name}name{/icat_name} {value type="nexus"} path_to_metadata {/value} {/record}
is mapped into : metadata_from_nexus_file
'param_str' and 'param_num' or Parameters node define an element of the Parameters table (Dataset, DataFile or Sample).
e.g. the mapping : {parameter type="param_str"} {icat_name} name{/icat_name} {value type="nexus"} path_to_metadata {/value} {description type="fix"} fixed metadata description{/description} {/parameter}
is mapped into : {parameter} {name}name {/name} {string_value}metadata_from_nexus_file {/string_value} {units}N/A{/units} {description}fixed metadata description{/description} {/parameter}
'keyword_tag' will split the source in its various word and fill the keyword table in the ICAT DB. The mapping with two neXus dataset is like : {keyword type="keyword_tag"} {icat_name} name {/icat_name} {value type="mix"} nexus:/{NXentry}/title | fix: , | nexus:/{NXentry}/notes {/value} {/keyword}

Metadata Sources

The source of the metadata is defined by nodes of type 'fix', 'nexus',
'special' and 'mix'. if the type is special. the begining of the text will
contain a modifier (fix:, nexus:, time: or sys: ) The value is then the text
without the modifier.

'fix' string from the mapping file itself.
'nexus' metadata is read from the neXus file according to the path
'special'
- 'fix:' idem as above
- 'nexus:' idem as above
- 'time:' the time can be expressed in multiple format, so the the value after the
  modifier will be composed in 3 parts :
  time:source ; input_format; output format
  - The source can be 'now' for the current time or 'nexus()' with the path to the time string between the parenthesis.
  - input and output format are optional. The s/w expect an integer. Currently the possible values are :
  1. '2007-05-23T12:48:05' (default)
  2. '2007-05-23 12:48:05'
  3. '2007-05-23'
  4. '12:48:05'
  5. '20070523'
  6. '200705'
  7. '2007'
  8. '23/05/2007'
- system:
  1. sys:filename gives the filename of the NeXus file.
  2. sys:location gives the path of the NeXus file.
  3. sys:size gives the size in bytes of the NeXus file.
'mix' To combine several sources, several modifiers used with node type 'special' are used separated with '|'.
e.g. nexus:/{NXentry}/{NXinstrument}.short_name | fix:_ | time:now ; 0 ; 5

NeXus syntax.

NeXus Data is divided in different classes that hold data sets. The datasets may hold any type of data from a single byte to unlimited dimensionarrays. The data sets and the classes may also have attributes.

To collect data from a neXus file, you have to build the path to the data you want.

Dataset (singular string or number)
The path is the name of the different classes separated by '/' the last nameis the name of the dataset.
e.g. /run/title

Attribute (singular string or number)
The attribute name is separated from the dataset by a '.'
e.g. /run/data.units

Arrays
Most of the data will be stored as multi dimensional arrays. We may want toextract particular information from the data.

Specific value from an array
A null or positive number between square brackets after the data setname. nxingest consider all dataset an uni-dimension.
e.g. /run/data_array[3]

Derived value
nxingest may derived a few value from an array. To express that, you have to put the name of the derived parameter between square brackets. Available values are :
- [AVG] Average
- [STD] Standard Deviation
- [MIN] Minimum Value
- [MAX] Maximum Value
- [SUM] Sum of all values
e.g. /run/sample/temperature_log/value[AVG]

Generic classes
NeXus defined generic classes type that user can name freely. nxingest can use some of these to generalise the mapping files for similar instrument.
By Writing the class type under rounded brackets like {NXentry} the program will substitue it with the actual class name from the current file.
This is currenlty only available for {NXentry}, {NXinstrument} and {NXuser}
e.g. /{NXentry}/{NXinstrument}/source/name is equivalent to /run/MUSR/source/name and /entry_0/I18/source/name

Also there may be more than 1 user define in a NeXus file. nxingest will loop over each of them if the mapping include a special node 'user_tbl'.

ICAT

Friday, 24 October 2008

XML ingest

Tuesday, 21 October 2008

Oracle Express Edition (XE)

Monday, 6 October 2008

Welcome to the ICAT Developer blog

Science and Technology Facilities Council

Followers

Blog Archive

Contributors