The One To Many (OTM) Gateway API Specification is one of two APIs used to support communication between digital content repository systems (Repository) and distributed digital preservation systems (DDP). These APIs work in tandem to allow content captured in Repository systems to be copied to DDP systems for preservation.

The preservation Gateway functions as an aggregating cache for preservation requests originating with a repository and destined for a DDP via the One to Many Bridge API. The Gateway API provides a synchronous interface for requests related to preservation of content, even when the preservation system's interactions are fundamentally asynchronous.

Status of This Document

This document is a draft of a specification, created as part of the One to Many grant, funded by the Andrew W. Mellon Foundation.

Introduction

This specification describes APIs enabling digital object repository systems to manage preservation of content. The interfaces described here are intended to fit within the larger context of the One to Many Preservation Workflow.

This specification defines a set of interfaces for synchronous interactions for depositing, restoring, and purging objects across multiple distributed digital preservation (DDP) systems, using a single system managed by the repository administrators. Because DDPs (and the One to Many Bridge API) perform deposit and retrieval actions asynchronously, the Gateway's synchronous interfaces are needed to support real-time feedback for repository managers and curators. The Gateway also serves as a short-term cache of content to be preserved, allowing the repository system to continue its normal object life-cycle while ensuring exact bitwise preservation of the objects at the time they were selected for preservation. The Gateway relieves the repository of the requirement to manage preservation beyond calls to the Gateway API. To this end, Gateway implementations should seek to provide robust guarantees to the repository about the eventual preservation of content.

For background, it may be helpful to read the One to Many Project Overview and Goals and the One to Many User Stories.

The gateway interfaces are designed with substantial overlap with the S3 API ([[S3API]]). Interactions between the repository and the gateway closely follow patterns used by S3-compatible object stores. This design descision is intended to support ease of implementation and code reuse for repository systems that may already support S3 for object storage, or may wish to replicate preserved content in an object store as well as a DDP.

A note about Objects

The Gateway's repository-facing interfaces are concerned with the management of objects, represented by object-id URL arguments and packaged in the BagIt format. In the context of the repository, it's normal for these objects to be meaningful and interpretable beyond their bitwise content.

The Gateway exposes the object content as files within file groups for use by the Bridge. This distinction reflects a separation of concerns: the Bridge and DDP are ultimately concerned with guarantee of bitwise preservation and retrieval of files; the repository and Gateway are concerned with the presentation and interpretation of those files as representations of meaningful resources. By making this separation explicit, the Gateway and Bridge disclaim any out-of-band coordination of meaning between the repository and the DDP.

The semantics of the object-id, the scope of the objects (i.e. which files are included), and the relationships between the files are decisions left to the repository administrators. Repositories should include in each object sufficient information for an agent to determine its schema, understand its structure, and reconstruct it within the repository given only the file group, its file group identifier, and the contents of the individual files.

Repository API

Gateway Service Description

Provides a description of this Gateway, including the API version, and the available preservation providers.

Deposit Object

Creates or updates a Gateway Object for preservation of an Object from the repository.

Upon acceptance of the request, the Gateway MUST create a unique version identifier for the object, corresponding to the exact content in this request. This identifier MUST be returned to the client in a x-otm-version-id response header.

The request body MUST be a BagIt bag as defined by [[RFC8493]], packaged and compressed in the media type specified by the Content-Type request header. The Gateway MUST request deposit of the payload files to the preservation provider given in x-otm-preservation-provider via the Bridge's Deposit Content endpoint.

For efficiency, the requester MAY omit some or all payload files from the bag, instead providing a fetch.txt as specified in Section 2.2.3 of [[RFC8493]]. The URL given in the fetch.txt MUST be reachable by the Bridge (i.e. on the public internet) using the credentials and authentication methods described in Transfer File. In the case of files provided in this way, the Gateway MAY implement Transfer File as a redirect to the given URL.

Get Object Audit

Describes the status and history of the object's preservation. For more information about auditing, see OTM Appendix - Audit Events.

The information exposed by this endpoint is intended to be informational, allowing repository administrators to monitor deposit activity and resolve issues encountered during the deposit process. The Gateway MUST report any errors that would prevent it from requesting deposit via the Bridge in the gateway-errors field. The Gateway MAY choose to poll status information from the Bridge asynchronously, and is not required to to provide real-time information. The repository SHOULD NOT rely on the information in fields other than gateway-errors to be up-to-date in real time.

Initiate Restore

Request restore of an Object and all its contents for later retrieval.

The client can request restore of a specific version of the object by providing a versionId URL parameter. If this parameter is present, the Gateway MUST restore the exact content of the requested version. When the client does not specify a version, the Gateway SHOULD seek to restore the most recent version.

          POST /af48c3d?restore HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2019 20:35:00 GMT
          Content-Length: 0
        
          POST /af48c3d?restore&versionId=20190702T201500.001 HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2019 20:35:00 GMT
          Content-Length: 0
        
          HTTP/1.1 202 Accepted
          Date: Tue, 02 Jul 2019 20:35:00 GMT
          Content-Length: 0
          Server: OTM Preservation Gateway
        
          HTTP/1.1 200 OK
          Date: Tue, 02 Jul 2019 20:35:00 GMT
          Content-Length: 0
          Server: OTM Preservation Gateway
        
          HTTP/1.1 409 Conflict
          Date: Tue, 02 Jul 2019 20:35:00 GMT
          Content-Type: application/xml
          Server: OTM Preservation Gateway

          <?xml version="1.0" encoding="UTF-8"?>
          <Error>
            <Code>RestoreAlreadyInProgress</Code>
            <Message>Object restore is already in progress.</Message>
            <Resource>/af48c3d<Resource>
          </Error>
        

Retrieve Object

Retrieve the content of an object.

The client can request retrieval of a specific version of the object by providing a versionId URL parameter. If this parameter is present, the Gateway MUST provide the exact content of the requested version or return a failure response code. When the client does not specify a version, the Gateway MUST return the most recent available version. In either case, the response MUST include an x-otm-version-id header specifying the identifier for returned version.

          GET /af48c3d HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2019 20:45:00 GMT
          Content-Length: 0
        
          GET /af48c3d?versionId=20190702T201500.001 HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2019 20:45:00 GMT
          Content-Length: 0
        
          HTTP/1.1 200 OK
          Date: Tue, 02 Jul 2019 20:45:00 GMT
          ETag: "4efcb3d98ce0fabfd585eb6c4332859"
          Content-Length: 493285
          Content-Type: application/zip
          x-otm-version-id: 20190702T201500.001
          Server: OTM Preservation Gateway

          [493285 bytes of object data]
        
          HTTP/1.1 403 Forbidden
          Date: Tue, 02 Jul 2019 20:45:00 GMT
          ETag: "4efcb3d98ce0fabfd585eb6c4332859"
          Content-Type: application/xml
          Server: OTM Preservation Gateway

          <?xml version="1.0" encoding="UTF-8"?>
          <Error>
            <Code>InvalidObjectState</Code>
            <Message>The Object is not available</Message>
            <Resource>/af48c3d</Resource>
          </Error>
        

Purge Object

Initiates a purge of the object from the preservation system. This will result in eradication of the object's content. If a versionId is provided, the Gateway MUST request deletion objects matching the specified version. Otherwise it MUST request deletion of all versions of the requested object.

          DELETE /af48c3d HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2019 20:25:00 GMT
          Content-Type: text/plain
        
          HTTP/1.1 204 NoContent
          Date: Tue, 02 Jul 2019 20:25:00 GMT
          Content-Length: 0
          Server: OTM Preservation Gateway
        
          DELETE /af48c3d?versionId=20190702T201500.001 HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2019 20:25:00 GMT
          Content-Type: text/plain
        
          HTTP/1.1 204 NoContent
          Date: Tue, 02 Jul 2019 20:25:00 GMT
          Content-Length: 0
          Server: OTM Preservation Gateway
        

Bridge API

Transfer File

Transfer a cached file. This endpoint allows a Bridge to pull content for storage in a preservation service.

The client MUST provide a versionId parameter to guarantee the object fetched for preservation is the exact version requested. For the same reason, the client SHOULD use the If-Match header.

This endpoint MUST support HTTP Basic Authentication as described in [[RFC7617]]. Appropriate credentials should be provided to the Bridge via the Register endpoint. Gateways SHOULD use different credentials for each Bridge/preservation system.

          GET /af48c3d/file1?versionId=20190702T201500.001 HTTP/1.1
          Host: preservation-gateway.institution.edu
          Date: Tue, 02 Jul 2020 12:00:00 GMT
          If-Match: a93eddb6387aaaa61f6192926214d338
          Authorization: Basic QWxhZGRpbjpPcGVuU2VzYW17
        
          HTTP/1.1 200 OK
          Date: Tue, 02 Jul 2020 12:00:01 GMT
          ETag: "Tag: "4efcb3d98ce0fabfd585eb6c4332859"
          Content-Length: 2357
          Content-Type: application/octet-stream
          x-otm-version-id: 20190702T201500.001
          Server: OTM Preservation Gateway

          [2357 bytes of object data]