Author: Adriano BarbosaLast Updated: Thu, Jan 12, 2023
Anaconda is a powerful, open-source software distribution for Python and R programming languages. It is designed to solve many problems associated with managing and using Python packages and their dependencies. It also provides a wealth of powerful tools for data science and machine learning. The Anaconda repository is a vast database of over 8,000 open-source packages, offering a comprehensive range of data science and machine learning options.
This guide covers the deployment, use, and security of an Anaconda instance at Vultr. It explains the various components of Anaconda and how to deploy it securely. It also demonstrates how to install packages, use the Anaconda command line, and more. This guide explains how to set up an Anaconda instance to work with data science and machine learning tasks.
You should have a working knowledge of Python to follow this guide.
Deploy a Cloud GPU server with the Vultr Marketplace Anaconda App. The deployment may take a few minutes to complete, and you can follow the initialization process on the instance console.
After the deployment, log in on the server via SSH, then update and reboot the machine to apply the updates.
# apt-get update && apt-get dist-upgrade -y # conda init # conda update conda # reboot
If you don't have an SSH key pair installed yet, create one and install it on your instance. You need your SSH key pair installed because you cannot log in via SSH with your password after making the next set of changes.
Configure SSH to not accept password logins by opening
/etc/ssh/sshd_config with your favorite editor and uncomment the following line:
Restart the SSH server.
# systemctl restart ssh
It is an excellent policy to close all ports on the instance firewall and open only the required ports. Here's an example that closes all ports except SSH.
# ufw reset # ufw enable # ufw default allow outgoing # ufw default deny incoming # ufw allow ssh/tcp
The last command makes the firewall accept connections on port 22 from any host. If you have a static IP, you can change the last line to only allow connections from that specific IP.
# ufw allow proto tcp from YOUR_IP to INSTANCE_IP port 22
The ufw firewall supports connection rate limiting, which is useful for protecting against brute-force login attacks. When a limit rule is used, ufw will normally allow the connection but will deny connections if an IP address attempts to initiate 6 or more connections within 30 seconds.
# ufw limit ssh/tcp
Verify the firewall rules using the command below.
# ufw status verbose Status: active Logging: on (low) Default: deny (incoming), allow (outgoing), disabled (routed) New profiles: skip To Action From -- ------ ---- 22/tcp LIMIT IN Anywhere 22/tcp (v6) LIMIT IN Anywhere (v6)
It is recommended to use virtual environments while working on Python projects. Here are some examples.
Create a virtual environment with Anaconda:
# conda create --name myvenv
Create the virtual environment specifying the Python version:
# conda create --name myvenv python=3.10
Clone an existing environment:
# conda create --name clonevenv --clone myvenv
Activate a virtual environment:
# conda activate myvenv
Deactivate an environment:
# conda deactivate
Remove a virtual environment:
# conda env remove -n myvenv
Install a package:
# conda install pandas
View the installed packages:
# conda list
Remove an installed package:
# conda remove pandas
The commands above are executed on the active virtual environment. You can also execute those commands on another virtual environment by using the parameter
# conda install -n anothervenv pandas
Here's an example that shows how to use the Anaconda Marketplace app to run the K-Nearest Neighbor (KNN) algorithm.
Create an environment for the example.
# conda create --name knn_example # conda activate knn_example
# conda install scikit-learn
Create the file
knn_example.py with the following code:
from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn import metrics # Load data iris = datasets.load_iris() print("Features:", iris.feature_names) print("Labels:", iris.target_names) print("Data size:", iris.data.shape) print("Examples:\n", iris.data[0:3]) # Split train and test instances. 70% training, and 30% test X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3) # Model fit knn = KNeighborsClassifier(n_neighbors=5).fit(X_train, y_train) # Predict y_pred = knn.predict(X_test) # Result print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
This code loads the iris flower dataset and prints some information about this dataset. The dataset is split randomly into train and test subsets. The train portion is used to train the KNN classifier, and the test portion is used to validate the model by comparing the predicted result given by the model with the real labels.
Run the code with:
# python knn_example.py Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Labels: ['setosa' 'versicolor' 'virginica'] Data size: (150, 4) Examples: [[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2]] Accuracy: 0.9555555555555556
Anaconda comes with many packages already installed in the base environment, and one of those packages is Jupyter. By default, Jupyter runs a webserver only accessible from localhost. To be able to use it from a remote machine, you can use SSH port forwarding.
Connect to the Anaconda instance using SSH and port forwarding. Jupyter uses port 8888 by default.
$ ssh -L 8888:127.0.0.1:8888 root@INSTANCE_IP
After you connect to the Anaconda instance, start Jupyter Notebook:
# jupyter nbclassic --allow-root --no-browser ... To access the notebook, open this file in a browser: file:///root/.local/share/jupyter/runtime/nbserver-17248-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=864fa232b1c37127616370df9e6bf1f867658c240ac9d97f or http://127.0.0.1:8888/?token=864fa232b1c37127616370df9e6bf1f867658c240ac9d97f
Copy the URL and paste it into the browser on your local machine.
The connection is secure because it's forwarded to port 8888 through the SSH tunnel.
You can also make Jupyter available publicly using HTTPS with password protection.
Generate the Jupyter configuration file:
# jupyter server --generate-config
Define a secure password.
# jupyter server password
You can run this command anytime you need to change the password.
Generate a self-signed SSL certificate with OpenSSL to secure the HTTPS connections.
# openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/private/server.key -out /etc/ssl/server.crt -days 365 -nodes
Or, if you have a domain, use a certificate authority like Let's Encrypt to issue the certificate.
Edit the following lines on the Jupyter configuration file
c.ServerApp.allow_password_change = False c.ServerApp.allow_root = True c.ServerApp.certfile = u'/etc/ssl/server.crt' c.ServerApp.keyfile = u'/etc/ssl/private/server.key' c.ServerApp.ip = '0.0.0.0' c.ServerApp.open_browser = False c.ServerApp.port = 8888
Open the Jupyter port on the firewall
# ufw allow 8888/tcp
Starting the Jupyter server with no parameter is enough, as the needed parameters are already set on the configuration file
# jupyter nbclassic
Anyone on the internet can see your server running on
https://INSTANCE_IP:8888/, but the access is protected with your password. Notice that using self-signed SSL certificates will trigger a warning on your browser when you access your Jupyter webserver. Don't be afraid of those warnings because you can bypass them.
Jupyter Lab is also available by appending
/lab in the URL like these examples, depending on which method you use.
This guide covered deploying and securing an Anaconda instance, managing Python virtual environments, and installing and uninstalling packages using Anaconda. It also showed two usage examples, KNN classification using
scikit-learn and how to use Jupyter Notebooks.