This document is licensed under a Creative Commons Attribution 4.0 License .
The One To Many (OTM) Gateway API Specification is one of two APIs used to support communication between digital content repository systems (Repository) and distributed digital preservation systems (DDP). These APIs work in tandem to allow content captured in Repository systems to be copied to DDP systems for preservation.
The preservation Gateway functions as an aggregating cache for preservation requests originating with a repository and destined for a DDP via the One to Many Bridge API. The Gateway API provides a synchronous interface for requests related to preservation of content, even when the preservation system's interactions are fundamentally asynchronous.
This document is a draft of a specification, created as part of the One to Many grant, funded by the Andrew W. Mellon Foundation.
This specification describes APIs enabling digital object repository systems to manage preservation of content. The interfaces described here are intended to fit within the larger context of the One to Many Preservation Workflow.
This specification defines a set of interfaces for synchronous interactions for depositing, restoring, and purging objects across multiple distributed digital preservation (DDP) systems, using a single system managed by the repository administrators. Because DDPs (and the One to Many Bridge API) perform deposit and retrieval actions asynchronously, the Gateway's synchronous interfaces are needed to support real-time feedback for repository managers and curators. The Gateway also serves as a short-term cache of content to be preserved, allowing the repository system to continue its normal object life-cycle while ensuring exact bitwise preservation of the objects at the time they were selected for preservation. The Gateway relieves the repository of the requirement to manage preservation beyond calls to the Gateway API. To this end, Gateway implementations should seek to provide robust guarantees to the repository about the eventual preservation of content.
For background, it may be helpful to read the One to Many Project Overview and Goals and the One to Many User Stories.
The gateway interfaces are designed with substantial overlap with the S3 API ([[S3API]]). Interactions between the repository and the gateway closely follow patterns used by S3-compatible object stores. This design descision is intended to support ease of implementation and code reuse for repository systems that may already support S3 for object storage, or may wish to replicate preserved content in an object store as well as a DDP.
The Gateway's repository-facing interfaces are concerned with the management of objects, represented by
object-id
URL arguments and packaged in the BagIt format. In the context of the repository, it's normal
for these objects to be meaningful and interpretable beyond their bitwise content.
The Gateway exposes the object content as files within file groups for use by the Bridge. This distinction reflects a separation of concerns: the Bridge and DDP are ultimately concerned with guarantee of bitwise preservation and retrieval of files; the repository and Gateway are concerned with the presentation and interpretation of those files as representations of meaningful resources. By making this separation explicit, the Gateway and Bridge disclaim any out-of-band coordination of meaning between the repository and the DDP.
The semantics of the object-id
, the scope of the objects (i.e. which files are included), and the
relationships between the files are decisions left to the repository administrators. Repositories should include in
each object sufficient information for an agent to determine its schema, understand its structure, and
reconstruct it within the repository given only the file group, its file group identifier, and the
contents of the individual files.
Provides a description of this Gateway, including the API version, and the available preservation providers.
GET
/
JSON
gateway-version
: The current version of the Gateway API.providers
: The preservation providers available through this Gateway.
name
: The name of the preservation provider; e.g. "chronopolis". This name MUST be unique
in the context of this gateway. Gateways SHOULD NOT change the names of preservation providers.200
(on success){ "gateway-version" : "0.1.0", "providers" : [{ "name": "ddp1_name" }, { "name": "ddp2_name" }] }
Creates or updates a Gateway Object for preservation of an Object from the repository.
Upon acceptance of the request, the Gateway MUST create a unique version identifier for the object, corresponding
to the exact content in this request. This identifier MUST be returned to the client in a
x-otm-version-id
response header.
The request body MUST be a BagIt bag as defined by [[RFC8493]], packaged and compressed in the media type
specified by the Content-Type
request header. The Gateway MUST request deposit of the payload files to
the preservation provider given in x-otm-preservation-provider
via the Bridge's Deposit Content endpoint.
For efficiency, the requester MAY omit some or all payload files from the bag, instead providing a
fetch.txt
as specified in Section 2.2.3 of [[RFC8493]]. The URL given in the fetch.txt
MUST
be reachable by the Bridge (i.e. on the public internet) using the credentials and authentication methods described
in Transfer File. In the case of files provided in this way, the Gateway MAY implement
Transfer File as a redirect to the given URL.
PUT
/{object-id}
Content-Type
: Media type of the object to be deposited.Content-Length
: Size in bytes of the object to be depositedx-otm-preservation-provider
: the name of preservation systems targeted for this deposit.200
(on success)ETag
: Entity tag for the deposited object.x-otm-version-id
: Version of the object.PUT /af48c3d HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:15:00 GMT Content-Type: application/zip Content-Length: 493285 Content-MD5: 4efcb3d98ce0fabfd585eb6c4332859 [493285 bytes of object data]
HTTP/1.1 200 OK Date: Tue, 02 Jul 2019 20:15:00 GMT ETag: "4efcb3d98ce0fabfd585eb6c4332859" Content-Length: 0 Server: OTM Preservation Gateway
fetch.txt
allows the repository to avoid the cost of packaging large payload files
and transmitting them to the Gateway. In exchange for this efficiency the repository sacrifices the caching
properties of the Gateway, increasing the likelihood of local changes causing checksum mismatches and resulting
deposit failures. Repositories that can guarantee the availability of content indefinitely (e.g. because files
are immutably versioned within the repository itself) won't suffer this drawback and may wish to use this method
of deposit exclusively.Describes the status and history of the object's preservation. For more information about auditing, see OTM Appendix - Audit Events.
The information exposed by this endpoint is intended to be informational, allowing repository administrators to
monitor deposit activity and resolve issues encountered during the deposit process. The Gateway MUST report any
errors that would prevent it from requesting deposit via the Bridge in the gateway-errors
field. The
Gateway MAY choose to poll status information from the Bridge asynchronously, and is not required to to provide
real-time information. The repository SHOULD NOT rely on the information in fields other than
gateway-errors
to be up-to-date in real time.
GET
/{object-id}/audit ? versionId=
JSON
object-id
: The id of the object this describes.deposits
: Data about the deposited versions; this exposes any errors within the Gateway (e.g.
failed bag validation) and the Bridge status.
version
: The object version this deposit corresponds to.gateway-errors
: Human readable description of any errors encountered by the gateway.status
: The status text given by the
Bridge.file-count
: The count of files included in the deposit, as reported by the
Bridge.details
: Additional details about the state of the deposit.audit-events
: A list of audit events intended to be formatted for display to curators and
repository administrators; see the Bridge's Get Audit Log for more
information.200
(on success)[ "object-id": "af48c3d", "deposits": [ { "version" "20190702T201500.001", "gateway-errors": null, "status": "DEPOSIT_ACCEPTED", "file-count": "2", "details": "" } ], "audit-events": [] ]
Request restore of an Object and all its contents for later retrieval.
The client can request restore of a specific version of the object by providing a versionId
URL
parameter. If this parameter is present, the Gateway MUST restore the exact content of the requested version. When
the client does not specify a version, the Gateway SHOULD seek to restore the most recent version.
POST
/{object-id} ? restore & versionId=
202
(if the object restore has been initiated and the object is not yet available for
retrieval)200
(if the object is restored and available for retrieval)409
(if there is a restore already in progress)POST /af48c3d?restore HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:35:00 GMT Content-Length: 0
POST /af48c3d?restore&versionId=20190702T201500.001 HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:35:00 GMT Content-Length: 0
HTTP/1.1 202 Accepted Date: Tue, 02 Jul 2019 20:35:00 GMT Content-Length: 0 Server: OTM Preservation Gateway
HTTP/1.1 200 OK Date: Tue, 02 Jul 2019 20:35:00 GMT Content-Length: 0 Server: OTM Preservation Gateway
HTTP/1.1 409 Conflict Date: Tue, 02 Jul 2019 20:35:00 GMT Content-Type: application/xml Server: OTM Preservation Gateway <?xml version="1.0" encoding="UTF-8"?> <Error> <Code>RestoreAlreadyInProgress</Code> <Message>Object restore is already in progress.</Message> <Resource>/af48c3d<Resource> </Error>
Retrieve the content of an object.
The client can request retrieval of a specific version of the object by providing a versionId
URL
parameter. If this parameter is present, the Gateway MUST provide the exact content of the requested version or
return a failure response code. When the client does not specify a version, the Gateway MUST return the most recent
available version. In either case, the response MUST include an x-otm-version-id
header specifying the
identifier for returned version.
GET
/{object-id} ? versionId=
If-Match
If-None-Match
Accept
200
(on success)403
(when attempting to access an unavailable preserved object; i.e. one that has not been
restored)412
(when an If-Match or If-None-Match header fails)Content-Type
ETag
x-otm-version-id
: Version of the object.GET /af48c3d HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:45:00 GMT Content-Length: 0
GET /af48c3d?versionId=20190702T201500.001 HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:45:00 GMT Content-Length: 0
HTTP/1.1 200 OK Date: Tue, 02 Jul 2019 20:45:00 GMT ETag: "4efcb3d98ce0fabfd585eb6c4332859" Content-Length: 493285 Content-Type: application/zip x-otm-version-id: 20190702T201500.001 Server: OTM Preservation Gateway [493285 bytes of object data]
HTTP/1.1 403 Forbidden Date: Tue, 02 Jul 2019 20:45:00 GMT ETag: "4efcb3d98ce0fabfd585eb6c4332859" Content-Type: application/xml Server: OTM Preservation Gateway <?xml version="1.0" encoding="UTF-8"?> <Error> <Code>InvalidObjectState</Code> <Message>The Object is not available</Message> <Resource>/af48c3d</Resource> </Error>
Initiates a purge of the object from the preservation system. This will result in eradication of the object's
content. If a versionId
is provided, the Gateway MUST request deletion objects matching the specified
version. Otherwise it MUST request deletion of all versions of the requested object.
DELETE
/{object-id} ? versionId=
204
(on success)DELETE /af48c3d HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:25:00 GMT Content-Type: text/plain
HTTP/1.1 204 NoContent Date: Tue, 02 Jul 2019 20:25:00 GMT Content-Length: 0 Server: OTM Preservation Gateway
DELETE /af48c3d?versionId=20190702T201500.001 HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2019 20:25:00 GMT Content-Type: text/plain
HTTP/1.1 204 NoContent Date: Tue, 02 Jul 2019 20:25:00 GMT Content-Length: 0 Server: OTM Preservation Gateway
Transfer a cached file. This endpoint allows a Bridge to pull content for storage in a preservation service.
The client MUST provide a versionId
parameter to guarantee the object fetched for preservation is the
exact version requested. For the same reason, the client SHOULD use the If-Match
header.
This endpoint MUST support HTTP Basic Authentication as described in [[RFC7617]]. Appropriate credentials should be provided to the Bridge via the Register endpoint. Gateways SHOULD use different credentials for each Bridge/preservation system.
GET
/{object-id}/{fileName} ? versionId=
Authorization
If-Match
200
(on success)302
(when the file is not cached by the Gateway, but is available to the Bridge at an external
URL)401
(when the given credentials do not allow access to the file)404
(if the file is not present)412
(if the requested file does not match the If-Match
checksum.Content-Type
ETag
x-otm-version-id
GET /af48c3d/file1?versionId=20190702T201500.001 HTTP/1.1 Host: preservation-gateway.institution.edu Date: Tue, 02 Jul 2020 12:00:00 GMT If-Match: a93eddb6387aaaa61f6192926214d338 Authorization: Basic QWxhZGRpbjpPcGVuU2VzYW17
HTTP/1.1 200 OK Date: Tue, 02 Jul 2020 12:00:01 GMT ETag: "Tag: "4efcb3d98ce0fabfd585eb6c4332859" Content-Length: 2357 Content-Type: application/octet-stream x-otm-version-id: 20190702T201500.001 Server: OTM Preservation Gateway [2357 bytes of object data]