MongoDB Interview Questions and Answers For Experienced and Beginner Developers

Software Engineering Aug 9, 2022
🧠
Hey there! Learn how to answer system design questions for your MongoDB interview with in-depth video examples and fundamental NoSQL concepts in our System Design Interview Course.

Sneak Peek: The three most common system design questions:
- Design Instagram. Watch an answer to this question here.
- How would you build TinyURL? Watch answer here.
- Design YouTube. Watch a sample answer to this question here.

NoSQL databases have gained enormous popularity and a mass of advocates. MongoDB is the most popular NoSQL database with a continuously expanding user base at almost the same rate as other RDBs (Relational Databases).

MongoDB is known to be the next-generation document storage system due to its high performance, flexibility, and scalability when working with large sets of distributed data.

Below, we'll go over the most common MongoDB interview questions to help you prepare for your system design, NoSQL Databases, or MongoDB interview. We will start with the basics by simulating real-world interviews with recruiters and then gradually increase the complexity.

Jump to:

What is MongoDB?

MongoDB is an open-source document-oriented NoSQL database that was created in 2007 by Dwight Merriman, Eliot Horowitz, and Kevin Ryan. Rather than tables, columns, and rows, MongoDB is based on collections and JSON-like documents instead of databases.

What are the advantages of MongoDB?

MongoDB is known to be the best NoSQL database because of its following key features:

  • Schema Less: objects in the same collection can have different structures and different sets of fields. But if a schema is needed, we can always define our schema in the code.
  • High performance: MongoDB stores data in the RAM, allowing faster performance while executing queries.
  • High availability: MongoDB supports features like data sharding and replication (replica sets).
  • Easy scalability: MongoDB supports easy horizontal scaling through sharding (distrusting data across multiple servers).
  • Cost-effective: in the cloud-based MongoDB Atlas, we can adjust the cluster to automatically scale when needed.
  • Rich query language: Mongodb provides easy aggregation, and no more complex joins. It also Supports Ad Hoc queries.
  • Indexing − any field in MongoDB can be indexed; we can also create compound indexes.

What are the different types of NoSQL databases?

NoSQL stands for "Not Only SQL." NoSQL is a database that can handle all types of structured, large, and complex data. NoSQL database types include:

  • Key-Value
  • Graph
  • Column Oriented
  • Document Oriented

What are MongoDB documents and collections?

A Document in the MongoDB database is an object that represents a single record, and it is analogous to a row in a SQL table. MongoDB document has a key/value structure.

A Collection is a set of related documents; it acts as the equivalent of RDBs (relational database) tables. MongoDB database is simply a group of collections that hold a set of similar or partially similar documents.

What syntax is used to create and drop a collection in MongoDB?

db.createCollection(name,options) // create collection in MongoDB
db.collection.drop() //drop collection in MongoDB

How do you perform Insertion or Creation operations in MongoDB?

db.collection.insertOne({data}) // create single document at a time
db.collection.insertMany({data}) // insert many documents at once

How do you perform update operations in MongoDB?

Syntax to update one document in MongoDB is:

db.collection.updateOne({filter}, {update})

Syntax to insert many documents MongoDB is:

db.collection.updateMany({filter}, {update})

How does MongoDB store data?

MongoDB NoSQL database stores documents in BSON format, the binary encoded format of JSON. BSON offers different data types over JSON data. However, it uses more space as compared to JSON.

List some of the data types supported by MongoDB.

Some data types are:

  • Numbers (e.g., int, long, double, and decimal),
  • String,
  • Array,
  • Binary data,
  • Boolean,
  • Date,
  • Regular expressions,
  • ObjectId,
  • and Null.

Explain Namespace in MongoDB.

The combination of a database name with a collection or an index name using the . separator is called a namespace.

[database-name].[collection-or-index-name]

Does MongoDB support primary-key, foreign-key relationships?

By default, MongoDB doesn't support primary key-foreign key relationships. Every document in MongoDB contains an _id key field that uniquely identifies the document. However, this concept can be implemented by embedding one copy inside another.

Explain different data models in MongoDB.

MongoDB provides the following types of data models:

  1. Embedded data model: in this model, we store related pieces of information in a single database record. As a result, applications will need to issue fewer database calls to retrieve or update data. This embedded document model is also known as a de-normalized data model.
  2. Normalized data mode: traditional References technique where a child references a parent document by their _id field.

What is a Relational Database Management System?

Data points that are related to one another are stored and accessible in a relational database.

The relational model, an easy-to-understand method of representing data in tables, is the foundation of a relational database management system.

What are the guidelines for designing a good database schema in MongoDB?

Here are some general guidelines linked to schema design in a document-oriented database that highlights important considerations one should consider while modeling data relationships:

  • Storing related data that needs to be accessed together in the same document.
  • Modeling one-to-one relationships with embedded documents.
  • Modeling one-to-few relationships with embedded documents.
  • Modeling one-to-many and many-to-many relationships with child-parent references.

Explain Aggregation in MongoDB.

In MongoDB, aggregation is a multi-stage data processing pipeline used to run a series of complex operations on a collection of documents.

Each stage in the aggregation pipeline will receive input documents, transform them, and then forward the results as input to the next step down the pipeline until our goal is achieved. The stages in a pipeline can filter, sort, group, reshape and modify documents that pass through the pipeline.

Data Aggregation use cases include:

  1. Generate a summary of many documents in a collection (e.g., average, count,  sum, etc..).
  2. Perform joins using the $lookup aggregation operation.
  3. Sort and paginate data.
  4. Generate helpful business metrics.

List some of the aggregate stages of MongoDB.

Some of the aggregate stages of MongoDB are:

  • $match
  • $count
  • $lookup
  • $unwind
  • $sort
  • $project
  • $limit
  • $merge
  • $facet

Is there any need to create a database manually in MongoDB?

MongoDB eliminates the need for manual database creation by automatically creating one whenever a value is first saved into a defined collection.

Explain the importance of the dot notation.

We use dot notation in MongoDB to retrieve the array elements and fields of an embedded document.

Can we perform the SQL JOIN equivalent in MongoDB?

Mongo is not a relational database; however, doing join is now possible with MongoDB 3.2+. The new $lookup aggregation operator works in the same way as a left outer join:

{
   $lookup:
      {
         from: <foreign collection>,
         localField: <field from local collection's documents>,
         foreignField: <field from foreign collection's documents>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run> ],
         as: <output array field>
      }
}

Explain indexes and how to create an index in MongoDB.

Indexes are special data structures that hold a subset of the collection's data. The index can store the value of a field or set of fields, sorted by field value.

We use indexes to ensure the efficient execution of queries.

If we don’t have Indexes, MongoDB will do a whole collection scan to match documents requested by a query. If a suitable index exists, MongoDB will only scan through that index, limiting the number of documents it should examine. In addition, MongoDB can return sorted results efficiently by using the ordering in the index.

How do you create an index in MongoDB?

The syntax for creating an index in MongoDB is :

db.people.createIndex( { fieldName : 1} ) // creates an ascending index

db.people.createIndex( { fieldName: -1} ) // creates a descending index

What are the various kinds of Indexes in MongoDB?

Following are the various kinds of Indexes in MongoDB:

  • Default index: MongoDB creates a default index on the “_id” field for each collection.
  • Single field index: used to create a sorted index on a single field of a document.
  • Compound index: used to create an index by combining multiple fields.
  • Multi-key index: used to index an array field by indexing every element in an array.
  • Geospatial index: used for querying based on location data – comes in two types of indexes; 2d indexes and 2d sphere indexes.
  • Text Index: used for searching some text or string in a collection.
  • Hashed index: used for indexing the hashed value of a field; the hashed index is useful for sharding.

What happens if an index is too large to fit into the RAM?

When an index is too large to be stored in RAM, MongoDB reads the data from disk, which is substantially slower than from RAM.

What is the covered query in MongoDB?

For a query to be covered, all the fields used in that query should be part of an index, and all the fields returned as a result of that query should be in the same index.

Because all the fields used in the query are part of an index, MongoDB will need to scan only that index to match the query conditions and return the result. Since indexes are saved in RAM, grabbing data from indexes is much faster than grabbing data by scanning all documents.

Explain data replication and replica sets in MongoDB

Data replication is a horizontal scaling technique provided by most databases to ensure high availability, data protection, and increased fault tolerance. Essentially, replication refers to copying the same data from one database to another, creating a cluster of synchronized databases or nodes. If one of the nodes goes down, the application will still be available to users because the other nodes in the cluster are available and can respond to user requests.

Another advantage of data replication across multiple databases or nodes is database load balancing. With replication, read and write requests can be distributed across all available nodes in the cluster instead of exhausting a single node.

In MongoDB, a replica set is a cluster of replicated nodes. The master node in a replica set is the primary node, the only node that can perform write operations. The other nodes in the replica set are called secondary nodes, and they can perform only read operations. Any updates to the primary node are then replicated to the other nodes to ensure data consistency.

What is the minimum number of nodes a replica set requires?

A Replica Set needs a minimum of three nodes, a primary node and two secondary nodes. If the primary node goes down, a secondary node will be selected to take the primary node's role by a process called Replica Set Elections.

What is the maximum number of nodes a replica sets?

A MongoDB replica set can hold up to 50 nodes.

What is the maximum document size in MongoDB?

The maximum size of a single MongoDB document is 16MB with a maximum nested depth of 100 levels.

What is sharding in MongoDB?

In MongoDB, Sharding is a horizontal scaling technique for partitioning or breaking up data records across multiple machines, placing a subset of that data on each shard. Each partition is referred to as a database shard. Each shard can be a different replica set on its own. Sharding is mainly used in highly available systems to handle big data and large workloads.

MongoDB uses a query router, which is a reverse proxy that accepts a query and routes it to the appropriate shard(s).

What is Shard Key in MongoDB?

The shard key, which controls how evenly the collection's documents are distributed throughout the cluster's shards, can either be a single indexed field or several fields covered by a compound index.

The optimum shard key enables MongoDB to support common query patterns while distributing documents uniformly across the cluster.

What are the advantages of sharding?

Sharding Benefits include:

  • Increased read/write throughput.
  • Increased storage capacity.
  • High availability.

What is the GridFS in MongoDB?

GridFS is a MongoDB file system specification for dealing with large files that exceed the document size limit of 16MB, such as images, audio, files, video files, etc. GridFS can store and retrieve large files by breaking them into chunks and holding each in a separate document. Each piece can be up to 255k in size.

Explain Journaling in MongoDB.

Journaling is temporary storage that keeps the write operation logs in a journaling subdirectory created by MongoDB on your machine until it gets flushed to the core data directory. So, instead of MongoDB immediately writing data to the disk, it logs the write operation and the index modifications in an on-disk journal file first, then write it to the core data directory on an interval basis.

One advantage of journaling is that the records or journals are saved in consecutive tracks, meaning accessing data from the disk will be faster than accessing randomly distributed records (read about disk seek time ). Creating safe backups in case of system failure is another benefit.

In general, Journaling in MongoDB increases database durability and availability.

Does MongoDB push the writes to disk immediately or lazily?

In MongoDB, data is lazily pushed to disk. The data that was immediately written to the journal is updated. However, writing the data from the journal to disk is done lazily.

How frequently does MongoDB write updates to the disk?

MongoDB writes updates to the disk every 60 seconds by default. However, this can be altered using the parameters commitIntervalMs and syncPeriodSecs.

What are the different Storage Engines used by MongoDB?

MongoDB has two storage engines: WiredTiger and MMAPv1.

How do we configure the cache size in MongoDB?

We cannot configure the cache in MongoDB. MongoDB uses memory-mapped files to utilize free spaces on the system automatically.

What are some utilities for backup and restoring in MongoDB?

MongoDB has provided several utilities for accomplishing database backups and restoring databases in bulk. These utility scripts are:

  • mongoexport: a utility that export data stored in a MongoDB instance in an Extended JSON or CSV format.
  • mongoimport: a utility for loading data from a JSON export created by mongoexport into a MongoDB instance
  • mongodump: a utility for exporting a BSON dump of a running MongoDB instance.
  • mongorestore: a utility for restoring data from a BSON dumps into a MongoDB instance.

How does MongoDB provide concurrency?

When multiple clients attempt to read or write the same data simultaneously, it becomes crucial to protect data consistency and avoid conflicts. MongoDB handles concurrent operations using multi-granular locking with reader-writer locks at the database or collection level that provide concurrent readers with shared access to a resource but exclusive access to a single write operation.

For example, suppose one write operation acquires the database lock. In that case, all other write operations to the same database (even if they are to a separate collection) are blocked, waiting for the lock to be released.

There are four modes of locking:

  • R(shared lock)
  • W(exclusive lock)
  • r(intent shared lock)
  • w(intent exclusive lock).

Are there any MongoDB operations that can lock more than one database?

Yes. More than one database can be locked during operations like db.copyDatabase() and db.repairDatabase(), etc.

What are transactions in MongoDB?

By default, MongoDB writes operations are atomic (i.e., provide an "all-or-nothing" proposition) only at the level of a single document.

However, for use cases that demand atomicity of reads and writes to multiple documents, MongoDB 4.0+ supports multi-document ACID transactions even on distributed sharded clusters or replica sets.

So, a transaction is a process of modifying multiple documents as part of a single logical operation that will only succeed if every operation within the transaction has been executed correctly.

What is the importance of Profiler in MongoDB?

Poor schema or query design, improper index usage, or even flaws in the query itself can result in very slow queries. Because test datasets are often small, these performance issues are difficult to detect during the development phase. Additionally, manually evaluating the performance of each question is a highly tedious operation.

MongoDB offers a handy tool called Profiler that can evaluate operations based on specific criteria and log information about how all database operations are executed. The database profiler stores this data in a capped collection called “system.profile”.

Profiler provides three profiling levels.

  • Level 0 - No data will be stored by Profiler.
  • Level 1 - Only sluggish operations exceeding a certain threshold are logged by Profiler.
  • Level 2 - All operations logged by Profiler.

What is the use of the capped collection in MongoDB?

Capped collections are fixed-size collections. They support high-throughput operations by inserting and retrieving documents based on insertion order.

Capped collections are like circular buffers in the way they work. Capped collections automatically make room for new documents by overwriting their oldest entries.

What are alternatives to MongoDB?

Cassandra, CouchDB, Redis, Riak, and Hbase are some excellent alternatives to MongoDB.

Explain Capped Collection.

Capped collections ensure insertion order preservation. As a result, queries can return documents in insertion order without needing an index. Capped collections can enable higher insertion speed without this indexing overhead:

The syntax to create a capped collection is as follows:

db.createCollection(<collection_name>, {
	capped: Boolean, 
    autoIndexId: Boolean, 
    size: Number, 
    max : Number,
 })

Why is MongoDB not used with a 32-bit system?

MongoDB does not support 32-bit systems because they can only use 2GB of RAM, while MongoDB needs a lot of RAM to store data in caches. So, this restriction is insufficient for MongoDB to be used in production.

Tags

Amany Mounes

Amany is an ambitious Software Engineer specializing in web technologies. She is passionate about sharing her knowledge and helping others.

Product Management Today