If you want to update this page or add new content, please submit a pull request to the Homepage.
Housekeeping
Housekeeping is a background service that periodically cleans up resources and data that are no longer needed in Yorkie. It plays a crucial role in maintaining performance and efficiency by managing memory usage and optimizing data storage in the CRDT-based collaborative system.
Overview
As documents are edited over time, two types of overhead accumulate:
- Inactive clients prevent Garbage Collection from reclaiming tombstoned nodes.
- Change history grows continuously, increasing storage and memory costs.
Housekeeping addresses both by running two scheduled tasks:
| Task | Purpose |
|---|---|
| Client Deactivation | Deactivates clients that have been inactive beyond a threshold, enabling more effective garbage collection |
| Document Compaction | Consolidates old change history into a single snapshot to reduce storage overhead |
Client Deactivation for Garbage Collection
Why It Matters
In Yorkie's CRDT system, Garbage Collection uses the minVersionVector to determine which tombstoned nodes can be safely removed. The minVersionVector represents the minimum of all active clients' version vectors -- the set of changes that every active client has definitely received.
If a client becomes inactive but remains registered, its outdated version vector holds back the minVersionVector, preventing garbage collection from reclaiming potentially large amounts of data.
How It Works
- The scheduler triggers the deactivation task at the configured interval.
- For each Project, the system queries for clients that have not communicated with the server for longer than the
client-deactivate-threshold(default: 24 hours). - Each candidate client is deactivated, removing it from the active client set.
- The
minVersionVectorcan now advance, unblocking garbage collection.
Projects are processed in a round-robin fashion across runs, distributing load over time rather than processing all projects in a single cycle.
Document Compaction
Over time, a document accumulates a large history of individual changes. Document Compaction reduces storage overhead by:
- Removing old change history that is no longer needed for synchronization.
- Creating a new initial change that represents the current document state.
- Maintaining document integrity while reducing metadata size.
Compaction Criteria
A document is eligible for compaction when:
- It has accumulated at least
CompactionMinChangeschanges (default: 1000). - It is not currently attached to any client.
The second condition ensures that compaction does not interfere with active editing sessions. Document content remains identical after compaction -- only the internal change history is consolidated.
Configuration
Housekeeping behavior is configured through server startup flags or the CLI. The key parameters are:
| Parameter | Description | Default |
|---|---|---|
housekeeping-interval | Time between housekeeping runs | 30s |
housekeeping-candidates-limit-per-project | Maximum candidates returned per project in a single run | 500 |
housekeeping-project-fetch-size | Number of projects fetched per run | 100 |
housekeeping-compaction-min-changes | Minimum number of changes before a document is eligible for compaction | 1000 |
client-deactivate-threshold | Time after which an inactive client is deactivated | 24h |
These can be set when starting the server:
$ yorkie server \--housekeeping-interval 30s \--housekeeping-candidates-limit-per-project 500 \--housekeeping-compaction-min-changes 1000 \--client-deactivate-threshold 24h
Or updated per project using the CLI:
$ yorkie project update <project-name> \--client-deactivate-threshold 12h
Configuration by Environment
For development, use shorter intervals and lower thresholds for faster feedback:
$ yorkie server \--housekeeping-interval 10s \--housekeeping-candidates-limit-per-project 10 \--housekeeping-compaction-min-changes 100
For production, use longer intervals and higher limits to balance throughput with resource usage:
$ yorkie server \--housekeeping-interval 1m \--housekeeping-candidates-limit-per-project 1000 \--housekeeping-compaction-min-changes 5000
Cluster Mode Behavior
In a Cluster Mode deployment, only the leader server executes housekeeping tasks. This is coordinated through leader election, preventing duplicate work across cluster nodes.
For more on leader election, see Cluster Mode: Architecture Components.
Monitoring
Housekeeping logs its activity for observability:
HSKP: candidates 150, deactivated 45, 2.3sHSKP: candidates 89, compacted 12, 1.8s
These logs show the number of candidates processed, the actions taken, and the duration of each run. Use these to tune configuration parameters for your workload.
Further Reading
- Housekeeping design document -- Full technical design
- CRDT Concepts: Garbage Collection -- How GC uses version vectors to reclaim tombstones
- Synchronization: Document Compaction -- How compaction fits into the sync lifecycle
- Garbage Collection design document -- Deep dive into the GC mechanism
- Projects -- Per-project configuration including housekeeping thresholds
- CLI: Updating the Project -- How to configure client deactivation threshold
- Cluster Mode -- Leader election and distributed coordination
- Glossary -- Definitions of all key terms