The One To Many (OTM) Specification defines two APIs to support communication between digital content repository systems (Repository) and distributed digital preservation systems (DDP). These APIs work in tandem to allow content captured in Repository systems to be copied to DDP systems for preservation. The APIs defined are the OTM Repository Gateway API (Gateway) for the Repository and the OTM Bridge API (Bridge) for the DDP. The Gateway and the Bridge APIs handle intermediary communication between the Repository and DDP and allow each system to operate without any knowledge of the internals of the other system. Each API is designed to facilitate deployment either as part of or extension to the Repository (in the case of the Gateway) or the DDP (in the case of the Bridge) or as a stand-alone application. They each provide an HTTP-based approach for authentication, communication, and data transfer.

The descriptions and diagrams below reference the OTM Repository Gateway Specification and OTM Bridge API Specification and are intended to capture the context in which the API calls are expected to be used.

The DDP workflow sections below assume that an intermediary service will be used to communicate with the OTM Bridge and transform data provided by the OTM Bridge into a format acceptable to the DDP. The implementation of this piece will be dependent on the architecture and capabilities of the DDP. Allowing this service to remain separate from the Bridge ensures that the Bridge implementation is able to support a wide variety of DDPs.

The primary purpose of the systems and integrations described by the OTM Specifications is to support the deposit and recovery of content. Content is to be considered recoverable only after it has completed a successful deposit into the DDP. Content that has been deposited from a Repository into a DDP is intended to be recoverable even if all other OTM system components have failed. There are no guarantees of recoverability if content has not first completed a successful deposit.

Additional notes are provided below for the minimum required steps for implementation in Chronopolis.

Status of This Document

This document is an overview to a specification, created as part of the One to Many grant, funded by the Andrew W. Mellon Foundation.

Initialize

The initialization operation allows a DDP and Repository to connect their respective OTM Bridge and OTM Gateway applications so that data can be transferred between the two systems.

Flow

  1. An agreement is reached between a repository owner and DDP system that will allow repository content to be deposited into the DDP; appropriate SLA/MOU and other legal documentation is signed and arrangements for billing/invoicing are made
  2. The DDP administrator calls the Bridge Add Account endpoint to add the repository to the Bridge system and generate the credentials needed for the repository's Gateway to connect to the Bridge
  3. The DDP administrator provides the Bridge credentials to the Gateway administrator
  4. The Gateway administrator enters the Bridge credentials into the Gateway and the Gateway calls the Bridge Register endpoint to provide the Bridge with the details necessary to make calls back to the Gateway

Deposit

The Deposit workflow describes the process when an OTM Gateway requests that a filegroup be preserved.

As part of this workflow, a version identifier is passed from the OTM Gateway through the system so that a deposit can be related to a point in time. It is up to the DDP to determine how to store this information in a manner suited for long term preservation.

System to System Flow

  1. The Repository administrator selects a set of objects to be deposited
  2. The Repository calls the Gateway PUT Object endpoint once for each object to be deposited; this starts the deposit process
  3. The Gateway resolves each object into a set of files to be deposited; each file is either copied to the Gateway staging storage area or a link to the file is captured to allow transfer to the Bridge
  4. The Gateway calls the Bridge Deposit Content endpoint using the object ID as the filegroup identifier and providing an identifier for each file to be deposited
  5. The Bridge initiates a deposit action for each filegroup in the deposit request
  6. For each file in each filegroup the Bridge calls the Gateway GET File endpoint to transfer the file to the Bridge staging storage location
  7. As each file transfer into the Bridge staging storage completes, the Bridge compares the checksum of the transferred file to the checksum provided in the deposit request; any mismatches trigger a re-transfer
  8. Once all files in a filegroup are in Bridge staging storage and all checksums are validated, the status of the deposit is updated to `DEPOSIT_STAGED`
  9. The DDP calls the Bridge List Deposits endpoint on a regular schedule to check for new deposits in the `DEPOSIT_STAGED` state
  10. For each staged deposit in the Bridge the DDP copies the files from Bridge staging storage into the DDP ingest pipeline and performs a deposit (and replication)
  11. When the deposit into the DDP is finished, the DDP calls the Bridge Complete Deposit endpoint to inform the Bridge that the deposit is complete
  12. The Bridge clears the files associated with the completed deposit from Bridge staging storage and transitions the deposit into a completed status
  13. The Gateway calls the Bridge Get Deposit Status endpoint in order to provide the Respository administrator with deposit status information
Deposit Workflow

DDP Workflow

The DDP portion of the Deposit workflow assumes that the OTM Bridge has already performed initial processing in order to prepare the filegroup for ingestion into a DDP.

Chronopolis Implementation Notes

New Deposit

  1. Query the OTM Bridge for incoming deposits
  2. Extract `filegroup-id` set from deposits response
  3. Look for the Deposit based on the `filegroup-id` in Chronopolis
    • Expect there to be no existing object in the DDP with an identifier of `filegroup-id`
  4. Ensure space exists in Chronopolis for this Deposit
    • If reserving space, an API needs to be built similar to the [[APTrust-Volume-Service]]
    • Otherwise wait for staging filesystem to have available space
  5. Package the Deposit for Chronopolis
    • Create an OCFL Object
    • Create ACE Tokens
  6. Use Chronopolis Ingest API to distribute the Deposit
  7. Query Chronopolis Ingest API for Deposit status

Updated Deposit

  1. Query the OTM Bridge for incoming deposits
  2. Extract `filegroup-id` set from deposits response
  3. Look for the Deposit based on the `filegroup-id` in Chronopolis
    • Expected to find existing content based on this flow
  4. Request to have the Deposit re-staged
    • Should only need OCFL metadata (inventory.json)
    • This could also handle storage reservation
  5. Using the json response from the OTM Bridge:
    • Mint a new version using the `version-id` from the OTM Bridge
      • The OCFL Object version will increment as usual
      • Store the version from the OTM Gateway in the `inventory.json`
    • New files will be added to the OCFL inventory and distributed in full
    • Existing files referenced in the OCFL inventory that have not been modified will not be redistributed
    • Files omitted from the json response will not be included in the current version
    • Create additional ACE Tokens for OCFL payload files
  6. Use Chronopolis Ingest API to distribute the updated Deposit
  7. Query Chronopolis Ingest API for Deposit status

DDP Deposit Workflow

Audit

The Audit workflow retrieves actions taken on filegroups that have been deposited. It is expected that the Audit Log will be maintained by the OTM Bridge and will contain events provided by the DDP.

See also: Audit Event Appendix

Flow

  1. The Repository manager selects an object and requests a preservation audit history
  2. The Repository calls the Gateway GET Object Audit endpoint for the object
  3. The Gateway calls the Bridge Get Audit Log endpoint, specifying the object ID as the filegroup identifier
  4. The Bridge gathers audit data for the given filegroup and associated files from its internal data store and responds to Gateway with the requested audit history data
  5. The Gateway translates the Bridge audit data into a format familiar to the repository and responds to the Repository request
  6. The Repository displays the audit data to the Repository manager
Get Audit Workflow

Restore

The Restore workflow handles returning data back to a Repository which had been previously deposited.

System to System Flow

  1. The Repository manager selects an object to be restored from preservation storage
  2. The Repository calls the Gateway POST Object Restore endpoint for the object to be restored
  3. The Gateway calls the Bridge Get Content Details in order to resolve the set of files to be restored
  4. The Gateway calls the Bridge Restore Content endpoint with the list of files to be restored
  5. The Bridge initiates a restore action for all files in the restore request and creates a directory in Bridge staging storage for the restored files
  6. The DDP calls the Bridge List Restores endpoint on a regular schedule to check for new restore requests
  7. The DDP copies each file in the restore request to the specified directory in Bridge staging storage
  8. When all files have been copied into Bridge staging storage the DDP calls the Bridge Complete Restore endpoint to inform the Bridge that the restored files are available
  9. The Bridge validates that all file checksums match the checksums provided in the restore request (when checksums are provided)
  10. The Bridge updates the status of the restore action to `RESTORE_STAGED`
  11. The Gateway calls the Bridge Restore Status endpoint on a regular basis to determine if the status of the restore is `RESTORE_STAGED`
  12. The Gateway calls the Bridge Get Restored Content endpoint for each file in the restore request and stores each file in the Gateway staging storage
  13. The Repository calls the Gateway Get Object endpoint and pulls the content into repository storage
  14. The Repository sends a notification to the Repository manager that requested the restore
Restore workflow

DDP Workflow

Chronopolis Implementation Notes

  1. Query the OTM Bridge for Restores
  2. Identify space for the restore to be staged on
    • Needs to be accessible by the OTM Gateway
    • On insufficient space await or reject
    • Need guarantees about available space while re-staging data
  3. Restage data
    • OTM With RO Mount: Create symbolic links to pull from
    • OTM Without RO Mount: Need process for retrieving data from Chronopolis nodes
  4. Notify the OTM Bridge that the Restore is staged and accessible with a given TTL
  5. Upon expiration of the TTL
    • Remove staged content
DDP Restore Workflow

Delete

The Delete workflow handles removing data from a DDP. It is assumed that this will be an operation which is non-recoverable and permanently removes data from a DDP. Deletes are expected when preserved content is discovered to be subject to legal or administrative restrictions that require its removal. It is recommened that repositories restrict the ability to delete content.

System to System Flow

  1. The Repository manager selects an object to be deleted from preservation storage
  2. The Repository calls the Gateway Purge Object endpoint for the object or version to be deleted
  3. The Gateway calls the Bridge Get Content Details endpoint and resolves the object into a set of files to be deleted
  4. The Gateway calls the Bridge Delete Content endpoint with the list of files to be deleted
  5. The Bridge initiates a delete action for all files in the delete request
  6. The DDP calls the Bridge List Deletes endpoint on a regular schedule to check for new delete requests
  7. The DDP performs a delete on each requested file; when all deletes are completed, the DDP calls the Bridge Complete Delete endpoint to inform the Bridge that the delete is complete
  8. The Repository administrator checks the object status in the Repository; the Repository requests information about the object from the Gateway to provide information.
Delete Workflow

DDP Workflow

Chronopolis Implementation Notes

Delete filegroup

  1. Query the OTM Bridge for deletions
  2. Identify the Chronopolis package associated with the `filegroup-id`
  3. Removing package
    • Start the Chronopolis deprecation workflow
  4. When Chronopolis package is deprecated, update the status of the OTM Bridge

Delete files

  1. Query the OTM Bridge for deletions
  2. Query Chronopolis for the `filegroup-id`
    • Identify the Chronopolis package associated with the `filegroup-id`
    • Identify the files within the package to remove
  3. Removing files/version
    • Follow OCFL recommendations for deletion
    • Tombstoning may be necessary
  4. Push package changes throughout Chronopolis
    • If this is considered a new package, treat it like a new deposit
    • Otherwise need workflow to cover overwriting files in a package
  5. When propagation of deletes is complete, update the status of the OTM Bridge
DDP Delete Workflow