Internals
This section goes over some of the internals of Yorkie, such as the CRDT, ordering of messages via Lamport clocks, etc.
References
For more details on the techniques used for understanding Yorkie, we recommend reading the following papers.
- H.-G. Roh, M. Jeon, J.-S. Kim, and J. Lee, “Replicated abstract data types: Building blocks for collaborative applications,” J. Parallel Distrib. Comput., vol. 71, no. 3, pp. 354–368, Mar. 2011. [Online].: Link
- Weihai Yu, “Supporting String-Wise Operations and Selective Undo for Peer-to-Peer Group Editing.“: Link
- Loïck Briot, Pascal Urso, Marc Shapiro, “High Responsiveness for Group Editing CRDTs“.: Link
- Kaleido: Implementing a Novel Data System for Multi-Device Synchronization: Link
Garbage Collection
One of the most important issues in CRDT systems is handling tombstones. Tombstones are used to properly synchronize the document even when remote peer's operations refer to the elements that have already been locally deleted. This causes the problem that the document keeps growing even if elements are deleted.
Yorkie provides Garbage Collection to solve this problem.
How it works
Garbage collection checks that deleted nodes are no longer referenced remotely and purges them completely.
Server records the logical timestamp of the last change pulled by the client whenever the client requests PushPull. And Server returns the smallest logical timestamp, min_synced_seq
of all clients in response PushPull to the client. min_synced_seq
is used to check whether deleted nodes are no longer to be referenced remotely or not.
An example of garbage collection:
State 1

In the initial state, both clients have "ab"
.
State 2

Client c1
deletes "b"
, which is recorded as a change with logical timestamp 3
. The text node of "b"
can be referenced
by remote, so it is only marked as tombstone. And the client c1
sends change 3
to server through PushPull API and receives
as a response that min_synced_seq
is 2
. Since all clients did not receive the deletion change 3
, the text node is not
purged by garbage collection.
Meanwhile, client c2
inserts "c"
after textnode "b"
.
State 3

Client c2
pushes change 4
to server and receives as a response that min_synced_seq
is 3
. After the client applies change 4
, the contents of document are changed to ac
. This time, all clients have received change 3
, so textnode "b"
is completely removed.
State 4

Finally, after client c1
receives change 4
from server, purges textnode "b"
because it is no longer referenced remotely.