Creating OPEX XML from Spreadsheets

28 Jan 2023

This is a brief user guide for creating OPEX XML documents using the 3rd party Spreadsheet Converter.

Spreadsheet Converter

This website allows you to upload a spreadsheet containing descriptive metadata and generate XML documents compatible with Preservica.

Background

The spreadsheet converter makes a few simple assumptions about how data is stored in the spreadsheet. It assumes that each row of the spreadsheet contains descriptive metadata for a single Preservica Asset. For example one row corresponds to the metadata for a single document or image etc.

Your spreadsheet should contain a header row, the column names in the header will become the metadata attributes within the exported XML documents.

For example, for Dublin Core metadata you may have a spreadsheet similar to:

Spreadsheet Converter

You may also have a column which contains the name of the digital object to which the metadata refers such as filename etc. The name of this column is not important. If you dont have a special column such as filename etc then one of the other column names must contain some unique information.

Spreadsheet Converter

The spreadsheet column names can also contain prefixes, these can be used to add additional XML namespaces into the XML documents. The prefixes are seperated by “:” from XML attribute names.

Prefixes can be useful for XML schema’s such as Dublin Core where the XML attributes live inside a different namespace to the main XML root element.

Spreadsheet Converter

The metadata is added to the rows under the column headings

Spreadsheet Converter

and the spreadsheet should be exported to UTF-8 CSV.

Spreadsheet Converter

You are now ready to upload the metadata.

Namespaces

The first field to enter on the website askes for a Root Element name

Spreadsheet Converter

This field controls the name of the XML metadata root element. This is the element that all the spreadsheet column names are children of.

For example, to create OAI-DC type metadata such as:

<?xml version="1.0" encoding="UTF-8"?>
<oai_dc:dc  xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">
	<oai_dc:title>Title 001</oai_dc:title>
	<oai_dc:creator>James Carr</oai_dc:creator>
	<oai_dc:subject>Sheffield</oai_dc:subject>
	<oai_dc:description/>
	<oai_dc:publisher/>
	<oai_dc:contributor/>
	<oai_dc:date/>
	<oai_dc:type/>
	<oai_dc:format/>
	<oai_dc:identifier/>
	<oai_dc:source/>
	<oai_dc:language/>
	<oai_dc:rights/>
</oai_dc:dc>

The root element name is dc and the root element namespace is http://www.openarchives.org/OAI/2.0/oai_dc/

By default all the element names are within the root element namespace.

Spreadsheet Converter

In this case the root element namespace and the Dublin Core elements should be in different namespaces, (15 term Dublin Core elements actually live inside the “http://purl.org/dc/elements/1.1/” namespace) so this is a good example of where we add a prefix to the column names, this allows us to associate additional namespaces to the elements.

Spreadsheet Converter

Formatting the Output

After the CSV has been uploaded you need to select which column should be used to name the XML documents. This should be a column containing either the filename or other unique information

Spreadsheet Converter

Naming Convention

Spreadsheet Converter

This is where you can decide on what type of metadata you would like to export.

The first two options (.xml and .metatadata) create simple XML documents which would look like:

<?xml version="1.0" encoding="UTF-8"?>
<dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<filename>image-0001.jpg</filename>
	<dc:title>Title 001</dc:title>
	<dc:creator>James Carr</dc:creator>
	<dc:subject>Sheffield</dc:subject>
	<dc:description/>
	<dc:publisher/>
	<dc:contributor/>
	<dc:date/>
	<dc:type/>
	<dc:format/>
	<dc:identifier/>
	<dc:source/>
	<dc:language/>
	<dc:rights/>
</dc>

The only difference being the file extension. The 3rd option wraps the XML in a OPEX header element ready for upload through the PUT tool.

<?xml version="1.0" encoding="UTF-8"?>
<opex:OPEXMetadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:opex="http://www.openpreservationexchange.org/opex/v1.2">
	<opex:DescriptiveMetadata>
		<oai_dc:dc>
			<dc:title>Title 001</dc:title>
			<dc:creator>James Carr</dc:creator>
			<dc:subject>Sheffield</dc:subject>
			<dc:description/>
			<dc:publisher/>
			<dc:contributor/>
			<dc:date/>
			<dc:type/>
			<dc:format/>
			<dc:identifier/>
			<dc:source/>
			<dc:language/>
			<dc:rights/>
		</oai_dc:dc>
	</opex:DescriptiveMetadata>
</opex:OPEXMetadata>

XML Formatting

This option allows you to format the XML to improve its readability by adding new lines and indenting elements.

Unique CSV Column

This option allows you to remove the additional column you may have added to store the object filename that should not be part of the XML. For example if you added a filename column in the spreadsheet.

Additional Namespaces

This field allows you to specify any additional namespaces which may be needed. For example every column prefix will need an additional namespace.

Spreadsheet Converter

Download

Once the XML has been created, you are shown 4 download buttons.

Spreadsheet Converter

The first button downloads a ZIP file containing a metadata document for each row in the spreadsheet. The name of the XML document is taken from the column used to identify the row.

The other buttons allow you to downloads an XSD schema for your metadata template, a template to allow you to custom the search index and a basic CMIS transform to allow the metadata to be displayed in UA.

James Carr

Creating OPEX XML from Spreadsheets

Background

Namespaces

Formatting the Output

Naming Convention

XML Formatting

Unique CSV Column

Additional Namespaces

Download

Related Posts

Event-Driven Integrations. Using The Webhook API To Create Custom Business Processes 10 Jun 2023

Custom Reporting via the Preservica Content API 19 Feb 2023

Customising Packages using the pyPreservica Python SDK 22 Jan 2023