Mongo DB Disk Space Issue

Background :  We are currently facing Diskspace issue with our mongo DB collections used for logs,  these collections are configured with an expiration time and  are auto deleted once the time stamp expires , however when there’s very high activity, even before the time stamp expires, disk space is filled  with  flood of collections and this eventually results in running out of diskspace and mongo goes offline  , and this triggers a page to Ops which must be handled. 

We are using mongo version v3.0.10,  with the limitations we have in production environments,   we cannot upgrade the version .

To solve this  problem we looked at multiple options. 

1) Having a scheduled job and automatically removing a collection after a defined period, however disk space will not be returned immediately to the file system, and commands like db.repairDatabase() has to be used, this process can be tedious and will have an extra over head of having another job running as a service. 

2) Another way of approaching this problem might be  based on dates, You could then have a script to automatically drop databases based on dates , which would also return the disk space afterwards. but with the kind of problem we are dealing with , where we can have flood of data instantly, this is still not a right solution. 

The third option we are having,  is a built in option provided by Mongo DB known as  Capped collection and can help us solve the problem. 

Optimal Solution :

Capped collection is one mechanism to limit mongo collection disk space.  ( verified on Prod environment, with fix in the code )

In MongoDB, a capped collection is a specialized type of collection that differs from a regular collection. Unlike regular collections, capped collections have a fixed size, meaning they can only hold a certain amount of data. Once a capped collection reaches its maximum size, it automatically begins overwriting the oldest documents in the collection with new ones. This feature makes capped collections useful for scenarios like rolling dataset, such as logs or event tracking. This is a great way to keep a large amount of data, discarding the older data as time goes by and keeping the same amount of disk-space used. 

Create a Capped Collection 

  • Eg:  Create capped collection with Maximum size 200 Bytes
    •  db.createCollection( “log“, { capped: true, size: 200 } )
  • Eg: Create capped collection with Maximum size 200 Bytes,  with Max number of documents 50
    • db.createCollection( “log“, { capped: true, size: 200, max:50 } )

Modify existing collection  to capped collection 

  • Eg: db.runCommand({ “convertToCapped” : “log“, size: 500, max : 50 })

Check if  collection  is a capped collection 

  • Eg: db.log.isCapped()

Advantages

  • A capped collection maintains data in the order of insertion and eliminates the overhead of indexing. This characteristic enables it to facilitate high throughput for insertions.
  • A capped collection proves valuable for storing log information because it organizes the data based on event order.

Disadvantages:

  • A capped collection can’t be sharded.
  • A capped collection can’t have TTL indexes.

Here are some key differences between capped and normal collections in MongoDB:

  • Size: Capped collections have a fixed size limit, while normal collections can grow dynamically.
  • Overwriting: When a capped collection reaches its maximum size, it will automatically overwrite its oldest documents. Normal collections don’t have this feature.
  • Ordering: Documents in a capped collection are stored in insertion order.

Runbook ( CLI ) : For converting a collection to capped collection   

  1. SSH to the machine where Mongo DB is hosted. 
  2. Connect to Hybrik  MongoDB
    1. mongo –port 38916
  3. Display DB
    1. show dbs
  4. Use the respective DB
    1. use load_svc
  5. check for the size of the collection – log
    1. db.log.stats()
  6.  Check for recommended size required  for the collection and remove  the extra logs. 
  7. Once recommended size is confirmed looking at available disk space   , create capped collection with the recommended size ( Eg  1 MB )
    1. db.runCommand({ “convertToCapped” : “log”, size: 1000000})
  8. Check if collection is  converted to capped or not
    1. db.log.isCapped()

Leave a comment