4.2. CouchDB Replication Protocol

The CouchDB Replication protocol is a protocol for synchronizing documents between 2 peers over HTTP/1.1.

4.2.1. Language

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

4.2.2. Goals

The CouchDB Replication protocol is a synchronization protocol for synchronizing documents between 2 peers over HTTP/1.1.

In theory the CouchDB protocol can be used between products that implement it. However the reference implementation, written in Erlang, is provided by the couch_replicator module available in Apache CouchDB.

The CouchDB replication protocol is using the CouchDB REST API and so is based on HTTP and the Apache CouchDB MVCC Data model. The primary goal of this specification is to describe the CouchDB replication algorithm.

4.2.3. Definitions

ID:
An identifier (could be an UUID) as described in RFC 4122
Sequence:
An ID provided by the changes feed. It can be numeric but not necessarily.
Revision:
(to define)
Document
A document is JSON entity with a unique ID and revision.
Database
A collection of documents with a unique URI
URI
An uri is defined by the RFC 2396 . It can be an URL as defined in RFC 1738.
Source
Database from where the Documents are replicated
Target
Database where the Document are replicated
Checkpoint
Last source sequence ID

4.2.4. Algorithm

  1. Get unique identifiers for the Source and Target based on their URI if replication task ID is not available.
  2. Save this identifier in a special Document named _local/<uniqueid> on the Target database. This document isn’t replicated. It will collect the last Source sequence ID, the Checkpoint, from the previous replication process.
  3. Get the Source changes feed by passing it the Checkpoint using the since parameter by calling the /<source>/_changes URL. The changes feed only return a list of current revisions.

Note

This step can be done continuously using the feed=longpoll or feed=continuous parameters. Then the feed will continuously get the changes.

  1. Collect a group of Document/Revisions ID pairs from the changes feed and send them to the target databases on the /<target>/_revs_diffs URL. The result will contain the list of revisions NOT in the Target.
  2. GET each revisions from the source Database by calling the URL /<source>/<docid>?revs=true&open_revs`=<revision> . This will get the document with its parent revisions. Also don’t forget to get attachments that aren’t already stored at the target. As an optimisation you can use the HTTP multipart api to get all.
  3. Collect a group of revisions fetched at previous step and store them on the target database using the Bulk Docs API with the new_edit: false JSON property to preserve their revisions ID.
  4. After the group of revision is stored on the Target, save the new Checkpoint on the Source.

Note

  • Even if some revisions have been ignored the sequence should be take in consideration for the Checkpoint.
  • To compare non numeric sequence ordering, you will have to keep an ordered list of the sequences IDS as they appear in the _changes feed and compare their indices.

4.2.5. Filter replication

The replication can be filtered by passing the filter parameter to the changes feeds with a function name. This will call a function on each changes. If this function return True, the document will be added to the feed.

4.2.6. Optimisations

  • The system should run each steps in parallel to reduce the latency.
  • The number of revisions passed to the step 3 and 6 should be large enough to reduce the bandwidth and make sure to reduce the latency.

4.2.7. API Reference

  • HEAD /{db} – Check Database existence
  • POST /{db}/_ensure_full_commit – Ensure that all changes are stored on disk
  • :get:`/{db}/_local/{id}` – Read the last Checkpoint
  • :put:`/{db}/_local/{id}` – Save a new Checkpoint

Push Only

Pull Only

  • GET /{db}/_changes – Locate changes since on Source the last pull. The request uses next query parameters:
    • style=all_docs
    • feed=feed , where feed is normal or longpoll
    • limit=limit
    • heartbeat=heartbeat
  • GET /{db}/{docid} – Retrieve a single Document from Source with attachments. The request uses next query parameters:
    • open_revs=revid - where revid is the actual Document Revision at the moment of the pull request
    • revs=true
    • atts_since=lastrev