Background : We are currently facing Diskspace issue with our mongo DB collections used for logs, these collections are configured with an expiration time and are auto deleted once the time stamp expires , however when there’s very high activity, even before the time stamp expires, disk space is filled with flood of collections and this eventually results in running out of diskspace and mongo goes offline , and this triggers a page to Ops which must be handled.
We are using mongo version v3.0.10, with the limitations we have in production environments, we cannot upgrade the version .
To solve this problem we looked at multiple options.
1) Having a scheduled job and automatically removing a collection after a defined period, however disk space will not be returned immediately to the file system, and commands like db.repairDatabase() has to be used, this process can be tedious and will have an extra over head of having another job running as a service.
2) Another way of approaching this problem might be based on dates, You could then have a script to automatically drop databases based on dates , which would also return the disk space afterwards. but with the kind of problem we are dealing with , where we can have flood of data instantly, this is still not a right solution.
The third option we are having, is a built in option provided by Mongo DB known as Capped collection and can help us solve the problem.
Optimal Solution :
Capped collection is one mechanism to limit mongo collection disk space. ( verified on Prod environment, with fix in the code )
In MongoDB, a capped collection is a specialized type of collection that differs from a regular collection. Unlike regular collections, capped collections have a fixed size, meaning they can only hold a certain amount of data. Once a capped collection reaches its maximum size, it automatically begins overwriting the oldest documents in the collection with new ones. This feature makes capped collections useful for scenarios like rolling dataset, such as logs or event tracking. This is a great way to keep a large amount of data, discarding the older data as time goes by and keeping the same amount of disk-space used.
- Eg: Create capped collection with Maximum size 200 Bytes
- db.createCollection( “log“, { capped: true, size: 200 } )
- Eg: Create capped collection with Maximum size 200 Bytes, with Max number of documents 50
- db.createCollection( “log“, { capped: true, size: 200, max:50 } )
Modify existing collection to capped collection
- Eg: db.runCommand({ “convertToCapped” : “log“, size: 500, max : 50 })
Check if collection is a capped collection
- Eg: db.log.isCapped()
Advantages
- A capped collection maintains data in the order of insertion and eliminates the overhead of indexing. This characteristic enables it to facilitate high throughput for insertions.
- A capped collection proves valuable for storing log information because it organizes the data based on event order.
Disadvantages:
- A capped collection can’t be sharded.
- A capped collection can’t have TTL indexes.
Here are some key differences between capped and normal collections in MongoDB:
- Size: Capped collections have a fixed size limit, while normal collections can grow dynamically.
- Overwriting: When a capped collection reaches its maximum size, it will automatically overwrite its oldest documents. Normal collections don’t have this feature.
- Ordering: Documents in a capped collection are stored in insertion order.
Runbook ( CLI ) : For converting a collection to capped collection
- SSH to the machine where Mongo DB is hosted.
- Connect to Hybrik MongoDB
- mongo –port 38916
- Display DB
- show dbs
- Use the respective DB
- use load_svc
- check for the size of the collection – log
- db.log.stats()
- Check for recommended size required for the collection and remove the extra logs.
- Once recommended size is confirmed looking at available disk space , create capped collection with the recommended size ( Eg 1 MB )
- db.runCommand({ “convertToCapped” : “log”, size: 1000000})
- Check if collection is converted to capped or not
- db.log.isCapped()