Deploy a Replica Set With High Availability in MongoDB 3.4 Using Keyfile for Access Control on Ubuntu 16.04

Published on: Fri, Dec 15, 2017 at 12:06 pm EST
Databases Linux Guides MongoDB Ubuntu

Since its conception in 2009 MongoDB has been leading the NoSQL industry. One of the core concepts of MongoDB is the Replica Set, so before working with it first lets review the concept.

About Replica Set

The simplest model of communication used in replication of databases is the Master-Slave architecture. As its name suggests this model has 2 roles which are spread in a unique master and many slaves, the role of the master is to process the read and write operations done by the clients and the slaves are treated as a replica of the master.

The most important advantage of this model is that the performance of the master is not compromised by the backup operations, the backup operations are done in an asynchronous way and this can become a serious problem when a master node fails. Slave nodes are read only and they have to be manually promoted to the master node, so in this time there is the possibility of losing data.

One option to solve the availability problem is to have more than one master in the architecture, but this can lead to another problem in the consistency of the data between those instances and the added complexity of the configuration.

Now given context we can present the Replica Set technology of MongoDB. Replica Set is the name of the Master-Slave architecture that has automatic failover, so in the moment that a master (which is now named primary) node fails to function properly an election will trigger and a new primary node will be elected from the remaining slaves (referred to now as secondaries).

Primary node

The primary node is the only one that performs write operations, by default read operations are handled by the primary too but this behavior can be changed later.

The operations are recorded into the oplog (operations log), then secondary nodes update their content asynchronously based on the content of the oplog

Note: oplog is a capped collection, this means that the collection has a limit, with local.oplog.rs you can check the content of this collection inside a mongo shell in any set member.

Secondary node

Besides being the ones who do a proper backup of the database, a secondary node has these roles:

  • Can accept read operations if needed.
  • Can trigger an election if a primary node fails.
  • Can vote in elections.
  • Can become the new primary if needed.

Thanks to these characteristics we can have different types of secondary nodes:

  • Priority 0: These nodes cannot become a primary and cannot trigger an election, still they can vote in elections, have a complete replica and can accept read operations. These can be helpful in multi data center deployment.
  • Hidden: These are Priority 0 members, but moreover they cannot process read operations. They may vote if necessary. Preferred tasks for these members are reporting and backups.
  • Delayed: These nodes are in charge of "historical data" by being delayed with some unit in time. A delayed member must be a priority 0 node, and it is recommended they be a hidden member also.

Prerequisites

  • The availability to run a minimum of 3 instances of Ubuntu 16.04 x64 with the same server size.

Design the Replica set

Before deploying an infrastructure it is important to design it, and there are points to consider in this design.

Choosing the number of members

Keep in mind that the minimum number of elements to build a Replica set is 3. You can mix the three types of nodes with a minimum of one primary and one secondary node.

In this guide we are deploying 3 members, one primary and two standard secondaries.

Note: It is recommended to have a maximum number of 7 voting members with a mix of arbiters and secondary members.

Choose a name

The name is just for reference but you are using it in the configuration of the set. Keep in mind that you can have more than one Replica set in your production environment, so do not neglect your set's name.

This tutorial encourages the user to select the name of the set.

Distribution of the members in different data centers

This tutorial suggests to deploy on the same data center so you can avoid having communication problems.

Note: In case of deploying in different data centers it is recommended to envelop your nodes with a VPN

Deployment instructions

Step 1: Deploy the minimum nodes for your infrastructure

Launch 3 nodes of Ubuntu 16.04 x64; in the same region from your customer portal, if possible. Do not forget to name them accordingly to the type of project you are dealing with and be sure to have the same server size in all these nodes.

After you have deployed your 3 nodes, you will have to be sure that every node can talk with the rest. You need to ssh into two nodes and reach the others using ping -c 4 EXAMPLE_IP. Change EXAMPLE_IP to the actual IPs of your nodes.

Here you can see an example of successful communication between two nodes.

root@foo_node:~# ping -c 4 EXAMPLE_IP
PING EXAMPLE_IP (EXAMPLE_IP) 56(84) bytes of data.
64 bytes from EXAMPLE_IP: icmp_seq=1 ttl=59 time=0.594 ms
64 bytes from EXAMPLE_IP: icmp_seq=2 ttl=59 time=0.640 ms
64 bytes from EXAMPLE_IP: icmp_seq=3 ttl=59 time=0.477 ms
64 bytes from EXAMPLE_IP: icmp_seq=4 ttl=59 time=0.551 ms

--- EXAMPLE_IP ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3021ms
rtt min/avg/max/mdev = 0.477/0.565/0.640/0.064 ms

Step 2: Install MongoDB in each node of your infrastructure

In general you can use the MongoDB package of Ubuntu, but it is better to use the official community repo because it is always up to date. This repo contains these packages:

  • mongodb-org, the group package that envelops the four components.
  • mongodb-org-server, this contains the mongod daemon (primary process that handles data requests).
  • mongodb-org-mongos, this contains the mongos daemon (routing service for shared deployments).
  • mongodb-org-shell, this is the mongo shell JavaScript interface.
  • mongodb-org-tools, some tools for administration activities.

Proceed to installing the packages.

Import the public key to the package management system.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6

Create the list file for MongoDB '/etc/apt/sources.list.d/mongodb-org-3.4.list'.

echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list

Update the package database.

sudo apt-get update

Install the MongoDB metapackage.

sudo apt-get install -y mongodb-org

Start the MongoDB service.

sudo service mongod start

Now you can open the mongo shell in any bash session. To do this, you have to use the mongo command. You will be greeted by something similar to this.

MongoDB shell version v3.4.7
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.7
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
Server has startup warnings:
*Some extra logs are cut by the way*
>

Do not forget to shut off the service with sudo service mongod stop, because later we will start mongod again with some parameters. Repeat this process in all 3 nodes of the set.

Step 3: Configure the access keyfile

Using a keyfile forces two concepts in Replica Set administration. The first one is Internal Authentication. By default you can start a mongo shell session without using a user and this session will have full control of the database, but when you use a keyfile for authentication your mongo shell session reaches a state called localhost exception. This state only lets you create the administrator user and the Replica set. The second concept is Role-Based Access Control, or in other words authorization. This is enforced to govern administrative levels to the Replica set.

Create your keyfile

The keyfile is the password to use in the set, this password must be the same in all members of the set. To increase security it is important to use a random key with the tool of your choice.

The content must be between 6 and 1064 characters long. Also you must set the read only permission for the keyfile.

chmod 400 PATH_OF_YOUR_KEYFILE
Place the keyfile in each set member

Now copy you keyfile to every set member, please use a consistent folder for future reference, and do not store it in a removable medium.

Also use a folder for the file that mongod can access.

Enforce using the keyfile in the Replica set

In this step we need to start the mongod daemon in every set member. There are two ways of starting the mongod process: using a config file or using the command line. Both are quite easy methods, but just for simplicity, this tutorial uses the command line version.

Command line configuration

Use the name you chose earlier in this command.

mongod --keyFile PATH_OF_YOUR_KEYFILE --replSet "YOUR_SET_NAME"

By default mongod does not run as a daemon. You will need to use the --fork parameter or use the upstart to fully run it as a daemon. In this tutorial we do not encourage running mongod as a daemon so you can see the logs into your terminal directly.

Note: Carefully type the name of the Replica set because once created you cannot change it.

Step 4: Connect to the localhost interface from one of the set members

Note: If you run mongod as a non-daemon process, then you will have to open another ssh connection to continue working.

You must use mongo command to open the mongo shell. This can be done in any member of the set.

At this moment we are in a state called localhost exception. When a keyfile is used to setup the mongod process, you are obligated to create a database administrator before you can apply read-write operations, but we will go into that later.

Step 5: Initiating the Replica set

This is a delicate part, we are using the command rs.initiate() inside the mongo shell from Step 4. before using this command let's review it.

rs.initiate(
  {
    _id : <replicaSetName>,
    members: [
      { _id : 0, host : "example1.net:27017" },
      { _id : 1, host : "example2.net:27017" },
      { _id : 2, host : "example3.net:27017" }
    ]
  }
)

The first _id field is a string and must match the --replSet that was passed before to mongod. Also, each value of host must be either the ip or the domain name of each member of the Replica set. Do not forget to append the port the mongo instance is using in each member.

Now it is time to execute the command with your data on it, this will trigger an election, then a primary will be elected automatically.

Here you should note that your shell cursor has changed to YOUR_SET_NAME:PRIMARY> or YOUR_SET_NAME:SECONDARY. This means that creating a set was a success.

To continue working you need to find the primary, if your are not on it of course. Use the rs.status() command to show the information of the Replica set and locate the primary. You are looking for the property "stateStr" : "PRIMARY".

Step 6: Creating the administrator

After you have located the primary, enter the mongo shell and run the next command using your data.

admin = db.getSiblingDB("admin")
admin.createUser(
  {
    user: "YOUR_USER",
    pwd: "YOU_PASSWORD",
    roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
  }
)

The admin = db.getSiblingDB("admin") part lets us write into admin from a different database. This creates an alias called admin, so we can execute commands using it instead.

If the operation is a success you will get a notification that the user has been added.

Successfully added user: {
    "user" : "YOUR_USER",
    "roles" : [
        {
            "role" : "userAdminAnyDatabase",
            "db" : "admin"
        }
    ]
}

At this point, we only have an administrator for all the servers, but having a Replica set forces us to have a user with the clusterAdmin role. We will create another user with only that role to separate concerns.

Step 7: Authenticating as the administer

We have reached the limit of the localhost exception, which is why we have to change the authentication to the user created one step before.

You can change users inside the mongo shell with the following.

db.getSiblingDB("admin").auth("YOUR_ADMIN", "YOUR_PASSWORD" )

If you have not already connected to the mongo shell use this command instead.

mongo -u "YOUR_ADMIN" -p "YOUR_PASSWORD" --authenticationDatabase "admin"

You will be notified of the changing of a user, and you can go the next step.

Step 8: Creating the cluster master

The clusterAdmin role gives the user full control of the Replica set. Creating it is as easy as creating the admin user.

db.getSiblingDB("admin").createUser(
  {
    "user" : "YOUR_USER",
    "pwd" : "YOUR_PASSWORD",
    roles: [ { "role" : "clusterAdmin", "db" : "admin" } ]
  }
)

Note that this time the role is changed to clusterAdmin.

Step 9: Inserting data into the Replica set

At this moment we have 2 admin users: one that has total control over the server and another that has access to administrative tasks on the Replica set level. We are, though, lacking a user who has access to "use" a database, So we will create that user now.

admin = db.getSiblingDB("admin")
admin.createUser(
  {
    user: "YOUR_USER",
    pwd: "YOUR_PASSWORD",
    roles: [ { role: "userAdminAnyDatabase", db: "cars" } ]
  }
)

Notice this time we are changing the db part, there we are putting the database accessible to the user, in this case we are using a database named cars.

The database is not created yet. To do so, you will have to type some commands to implicitly create it. Switch to the cars database.

use cars

You will get a notification: switched to db cars.

The database has still not been created, to do so you need to write something to it. We are using the following example.

db.models.insert({ make: "Dodge", model: "Viper", year: 2010 })

This time you will be notified with WriteResult({ "nInserted" : 1 }).

If you want, you can retrieve all the objects in the database, with the find() method:

db.models.find()
{ "_id" : ObjectId("59acd8b55334882863541ff4"), "make" : "Dodge", "model" : "Viper", "year" : 2010 }

Note that _id will be different in your output, but the other data should be the same. Given enough time, this data will be replicated to the other members.

Conclusion

Creating a Replica Set can be challenging at first because there is a lot of info to understand, but once you get the idea behind it you can deploy it in a breeze, so do not give up if you cannot grasp it in your first time. Keep in mind that Replica set is important in MongoDB administration because it opens the possibility to add advanced features like Load Balancing.

Want to contribute ?

You could earn up to $300 by adding new articles