This article is outdated and may not work correctly for current operating systems or software.
Since its conception in 2009 MongoDB has been leading the NoSQL industry. One of the core concepts of MongoDB is the Replica Set, so before working with it first lets review the concept.
The simplest model of communication used in replication of databases is the Master-Slave architecture. As its name suggests this model has 2 roles which are spread in a unique master and many slaves, the role of the master is to process the read and write operations done by the clients and the slaves are treated as a replica of the master.
The most important advantage of this model is that the performance of the master is not compromised by the backup operations, the backup operations are done in an asynchronous way and this can become a serious problem when a master node fails. Slave nodes are read only and they have to be manually promoted to the master node, so in this time there is the possibility of losing data.
One option to solve the availability problem is to have more than one master in the architecture, but this can lead to another problem in the consistency of the data between those instances and the added complexity of the configuration.
Now given context we can present the Replica Set technology of MongoDB. Replica Set is the name of the Master-Slave architecture that has automatic failover, so in the moment that a master (which is now named primary
) node fails to function properly an election
will trigger and a new primary node will be elected from the remaining slaves (referred to now as secondaries
).
The primary node is the only one that performs write operations, by default read operations are handled by the primary too but this behavior can be changed later.
The operations are recorded into the oplog
(operations log), then secondary nodes update their content asynchronously based on the content of the oplog
Note: oplog
is a capped collection, this means that the collection has a limit, with local.oplog.rs
you can check the content of this collection inside a mongo shell in any set member.
Besides being the ones who do a proper backup of the database, a secondary node has these roles:
Can accept read operations if needed.
Can trigger an election if a primary node fails.
Can vote in elections.
Can become the new primary if needed.
Thanks to these characteristics we can have different types of secondary nodes:
Priority 0: These nodes cannot become a primary
and cannot trigger an election, still they can vote in elections, have a complete replica and can accept read operations. These can be helpful in multi data center deployment.
Hidden: These are Priority 0
members, but moreover they cannot process read operations. They may vote if necessary. Preferred tasks for these members are reporting and backups.
Delayed: These nodes are in charge of "historical data" by being delayed with some unit in time. A delayed member must be a priority 0
node, and it is recommended they be a hidden
member also.
Before deploying an infrastructure it is important to design it, and there are points to consider in this design.
Keep in mind that the minimum number of elements to build a Replica set is 3. You can mix the three types of nodes with a minimum of one primary and one secondary node.
In this guide we are deploying 3 members, one primary and two standard secondaries.
Note: It is recommended to have a maximum number of 7 voting members with a mix of arbiters and secondary members.
The name is just for reference but you are using it in the configuration of the set. Keep in mind that you can have more than one Replica set in your production environment, so do not neglect your set's name.
This tutorial encourages the user to select the name of the set.
This tutorial suggests to deploy on the same data center so you can avoid having communication problems.
Note: In case of deploying in different data centers it is recommended to envelop your nodes with a VPN
Launch 3 nodes of Ubuntu 16.04 x64; in the same region from your customer portal, if possible. Do not forget to name them accordingly to the type of project you are dealing with and be sure to have the same server size in all these nodes.
After you have deployed your 3 nodes, you will have to be sure that every node can talk with the rest. You need to ssh into two nodes and reach the others using ping -c 4 EXAMPLE_IP
. Change EXAMPLE_IP
to the actual IPs of your nodes.
Here you can see an example of successful communication between two nodes.
root@foo_node:~# ping -c 4 EXAMPLE_IP
PING EXAMPLE_IP (EXAMPLE_IP) 56(84) bytes of data.
64 bytes from EXAMPLE_IP: icmp_seq=1 ttl=59 time=0.594 ms
64 bytes from EXAMPLE_IP: icmp_seq=2 ttl=59 time=0.640 ms
64 bytes from EXAMPLE_IP: icmp_seq=3 ttl=59 time=0.477 ms
64 bytes from EXAMPLE_IP: icmp_seq=4 ttl=59 time=0.551 ms
--- EXAMPLE_IP ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3021ms
rtt min/avg/max/mdev = 0.477/0.565/0.640/0.064 ms
In general you can use the MongoDB package of Ubuntu, but it is better to use the official community repo because it is always up to date. This repo contains these packages:
mongodb-org, the group package that envelops the four components.
mongodb-org-server, this contains the mongod
daemon (primary process that handles data requests).
mongodb-org-mongos, this contains the mongos
daemon (routing service for shared deployments).
mongodb-org-shell, this is the mongo shell
JavaScript interface.
mongodb-org-tools, some tools for administration activities.
Proceed to installing the packages.
Import the public key to the package management system.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
Create the list file for MongoDB '/etc/apt/sources.list.d/mongodb-org-3.4.list'.
echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
Update the package database.
sudo apt-get update
Install the MongoDB metapackage.
sudo apt-get install -y mongodb-org
Start the MongoDB service.
sudo service mongod start
Now you can open the mongo shell
in any bash session. To do this, you have to use the mongo
command. You will be greeted by something similar to this.
MongoDB shell version v3.4.7
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.7
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
Server has startup warnings:
*Some extra logs are cut by the way*
>
Do not forget to shut off the service with sudo service mongod stop
, because later we will start mongod
again with some parameters. Repeat this process in all 3 nodes of the set.
Using a keyfile forces two concepts in Replica Set administration. The first one is Internal Authentication
. By default you can start a mongo shell
session without using a user and this session will have full control of the database, but when you use a keyfile for authentication your mongo shell
session reaches a state called localhost exception
. This state only lets you create the administrator user and the Replica set. The second concept is Role-Based Access Control
, or in other words authorization. This is enforced to govern administrative levels to the Replica set.
The keyfile is the password to use in the set, this password must be the same in all members of the set. To increase security it is important to use a random key with the tool of your choice.
The content must be between 6 and 1064 characters long. Also you must set the read only
permission for the keyfile.
chmod 400 PATH_OF_YOUR_KEYFILE
Now copy you keyfile to every set member, please use a consistent folder for future reference, and do not store it in a removable medium.
Also use a folder for the file that mongod
can access.
In this step we need to start the mongod daemon
in every set member. There are two ways of starting the mongod
process: using a config file or using the command line. Both are quite easy methods, but just for simplicity, this tutorial uses the command line version.
Use the name you chose earlier in this command.
mongod --keyFile PATH_OF_YOUR_KEYFILE --replSet "YOUR_SET_NAME"
By default mongod
does not run as a daemon. You will need to use the --fork
parameter or use the upstart
to fully run it as a daemon. In this tutorial we do not encourage running mongod
as a daemon so you can see the logs into your terminal directly.
Note: Carefully type the name of the Replica set because once created you cannot change it.
Note: If you run mongod
as a non-daemon process, then you will have to open another ssh connection to continue working.
You must use mongo
command to open the mongo shell
. This can be done in any member of the set.
At this moment we are in a state called localhost exception
. When a keyfile is used to setup the mongod
process, you are obligated to create a database administrator before you can apply read-write operations, but we will go into that later.
This is a delicate part, we are using the command rs.initiate()
inside the mongo shell
from Step 4. before using this command let's review it.
rs.initiate(
{
_id : <replicaSetName>,
members: [
{ _id : 0, host : "example1.net:27017" },
{ _id : 1, host : "example2.net:27017" },
{ _id : 2, host : "example3.net:27017" }
]
}
)
The first _id
field is a string and must match the --replSet
that was passed before to mongod
. Also, each value of host
must be either the ip or the domain name of each member of the Replica set. Do not forget to append the port the mongo instance is using in each member.
Now it is time to execute the command with your data on it, this will trigger an election
, then a primary will be elected automatically.
Here you should note that your shell cursor has changed to YOUR_SET_NAME:PRIMARY>
or YOUR_SET_NAME:SECONDARY
. This means that creating a set was a success.
To continue working you need to find the primary
, if your are not on it of course. Use the rs.status()
command to show the information of the Replica set and locate the primary
. You are looking for the property "stateStr" : "PRIMARY"
.
After you have located the primary
, enter the mongo shell
and run the next command using your data.
admin = db.getSiblingDB("admin")
admin.createUser(
{
user: "YOUR_USER",
pwd: "YOU_PASSWORD",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
The admin = db.getSiblingDB("admin")
part lets us write into admin
from a different database. This creates an alias called admin
, so we can execute commands using it instead.
If the operation is a success you will get a notification that the user has been added.
Successfully added user: {
"user" : "YOUR_USER",
"roles" : [
{
"role" : "userAdminAnyDatabase",
"db" : "admin"
}
]
}
At this point, we only have an administrator for all the servers, but having a Replica set forces us to have a user with the clusterAdmin
role. We will create another user with only that role to separate concerns.
We have reached the limit of the localhost exception
, which is why we have to change the authentication to the user created one step before.
You can change users inside the mongo shell
with the following.
db.getSiblingDB("admin").auth("YOUR_ADMIN", "YOUR_PASSWORD" )
If you have not already connected to the mongo shell
use this command instead.
mongo -u "YOUR_ADMIN" -p "YOUR_PASSWORD" --authenticationDatabase "admin"
You will be notified of the changing of a user, and you can go the next step.
The clusterAdmin
role gives the user full control of the Replica set. Creating it is as easy as creating the admin user.
db.getSiblingDB("admin").createUser(
{
"user" : "YOUR_USER",
"pwd" : "YOUR_PASSWORD",
roles: [ { "role" : "clusterAdmin", "db" : "admin" } ]
}
)
Note that this time the role is changed to clusterAdmin
.
At this moment we have 2 admin users: one that has total control over the server and another that has access to administrative tasks on the Replica set level. We are, though, lacking a user who has access to "use" a database, So we will create that user now.
admin = db.getSiblingDB("admin")
admin.createUser(
{
user: "YOUR_USER",
pwd: "YOUR_PASSWORD",
roles: [ { role: "userAdminAnyDatabase", db: "cars" } ]
}
)
Notice this time we are changing the db
part, there we are putting the database accessible to the user, in this case we are using a database named cars
.
The database is not created yet. To do so, you will have to type some commands to implicitly create it. Switch to the cars
database.
use cars
You will get a notification: switched to db cars
.
The database has still not been created, to do so you need to write something to it. We are using the following example.
db.models.insert({ make: "Dodge", model: "Viper", year: 2010 })
This time you will be notified with WriteResult({ "nInserted" : 1 })
.
If you want, you can retrieve all the objects in the database, with the find()
method:
db.models.find()
{ "_id" : ObjectId("59acd8b55334882863541ff4"), "make" : "Dodge", "model" : "Viper", "year" : 2010 }
Note that _id
will be different in your output, but the other data should be the same. Given enough time, this data will be replicated to the other members.
Creating a Replica Set can be challenging at first because there is a lot of info to understand, but once you get the idea behind it you can deploy it in a breeze, so do not give up if you cannot grasp it in your first time. Keep in mind that Replica set is important in MongoDB administration because it opens the possibility to add advanced features like Load Balancing.