How to Import CSV Data to Vultr Managed Databases for PostgreSQL

Updated on January 4, 2023
How to Import CSV Data to Vultr Managed Databases for PostgreSQL header image

Introduction

A Comma Separated Values (CSV) file is a plain-text file that stores data in tabular format. Usually, CSV files share data between spreadsheet applications and database systems. CSV files separate field values and records using commas and new line characters. When implementing PostgreSQL databases, you can easily import data from CSV files using the \COPY command. The \COPY command is available from the psql package. This guide shows you how to copy CSV data from Ubuntu 20.04 server to a managed PostgreSQL database cluster.

Prerequisites

To proceed with this guide:

1. Create a Sample CSV File

In this guide, you need to create a sample CSV file. When working in a production environment, you may generate the CSV file automatically from spreadsheet applications or other database systems. Follow the steps below to create the sample file:

  1. SSH to your server and open a new products.csv file on a text editor.

     $ nano products.csv
  2. Enter the following information into the products.csv file.

     product_id,product_name,retail_price,available,created_on
     1,20L FOUNTAIN MINERAL WATER,5.25,Y,2022-12-12
     2,QUARTZ WALL CLOCK,30.20,N,2022-12-10
     3,RECHARGEABLE LED LAMP,14.40,Y,2022-12-08
     4,TYPE C 30W CHARGER,29.95,Y,2022-12-10
     5,DOUBLE SIDED TAPE,3.45,N,2022-12-12
  3. Save and close the products.csv file.

After setting up the CSV file, proceed to the next step to copy data into a PostgreSQL database cluster.

2. Copying Data into a PostgreSQL Database Cluster

To import data into a managed PostgreSQL database cluster from CSV, you need to follow the steps below:

  1. Log in to a managed PostgreSQL database cluster.

  2. Create a database.

  3. Create a table structure on the target PostgreSQL database that matches the data types of the CSV file fields.

  4. Use the PostgreSQL server \COPY command to load CSV data to the table.

2.1. Log in to the Managed PostgreSQL Database Cluster

  1. Install the postgresql-client package. This package allows you to use the PostgreSQL psql command from the Linux server.

     $ sudo apt update 
     $ sudo apt install -y postgresql-client
  2. Run the following psql command to log in to the managed PostgreSQL cluster. Replace the -h (host), -p (port), and -U (username) values with the correct credentials.

     $ psql -h SAMPLE_POSTGRESQL_DB_HOST_STRING.vultrdb.com -p 16751 -U vultradmin defaultdb
  3. Enter the password for the managed database cluster and press Enter to proceed.

    Output.

     defaultdb=>
  4. Create a sample xyz_db database.

     defaultdb=> CREATE DATABASE xyz_db;

    Output.

     CREATE DATABASE
  5. Connect to the new xyz_db database.

     defaultdb=> \c xyz_db;

    Output.

     psql ...
     ...
     You are now connected to database "xyz_db" as user "vultradmin".

After creating a database, initialize the table structure in the next step.

2.2. Create a Table Structure

After setting up and connecting to the target database, follow the steps below to create a table:

  1. Match the products.csv fields' values with the correct PostgreSQL data types as illustrated in the following list.

     +---------------+---------------------+
     | Field Name    | Data Type           | 
     +---------------+---------------------+
     | product_id    | INTEGER             |
     | product_name  | VARCHAR(50)         |
     | retail_price  | FLOAT               |
     | available     | CHAR(1)             |
     | created_on    | DATE                |
     +---------------+---------------------+
  2. Use the PostgreSQL CREATE TABLE command to set up a products table.

     xyz_db=> CREATE TABLE products (
                  product_id INTEGER PRIMARY KEY,
                  product_name VARCHAR(50),
                  retail_price FLOAT,
                  available CHAR(1),
                  created_on DATE
               );

    Output.

     CREATE TABLE
  3. Run the following describe statement command against the products table to ensure you've got the correct schema.

     xyz_db=> \d products;

    Output.

     Column    |         Type          | Collation | Nullable | Default
     --------------+-----------------------+-----------+----------+---------
      product_id   | integer               |           | not null |
      product_name | character varying(50) |           |          |
      retail_price | double precision      |           |          |
      available    | character(1)          |           |          |
      created_on   | date                  |           |          |
     Indexes:
         "products_pkey" PRIMARY KEY, btree (product_id)

You have now set up a target database and a table for the data. Follow the next step to load data into the database.

2.3. Use the PostgreSQL \COPY Command to Load Data

The PostgreSQL server \COPY command follows the syntax below:

    => \COPY SAMPLE_TABLE (SAMPLE_COMMA_SEPARATED_COLUMNS) 
        FROM 'SAMPLE_ABSOULTE_FILE_PATH'
        WITH (SAMPLE_OPTIONS);

The \COPY syntax explained:

  1. The command starts with a \COPY keyword.

  2. The SAMPLE_TABLE is the name of the target table where you want to place the data (For example, products).

  3. The (SAMPLE_COMMA_SEPARATED_COLUMNS) is an optional parameter that defines the order of the columns in the target table.

  4. The FROM 'SAMPLE_ABSOULTE_FILE_PATH' statement defines the absolute path where you have placed your CSV file (For example, ~/products.csv).

  5. The WITH (SAMPLE_OPTIONS) keyword defines the options that the PostgreSQL server should consider when executing the \COPY command as illustrated below:

    • FORMAT: This is the input data format (For example, FORMAT CSV).

    • DELIMITER: You should define the character separating the field values after the DELIMITER keyword (For example, DELIMITER ',').

    • HEADER: This directive tells the PostgreSQL server whether the raw input CSV file contains a header. If you set this value to true (HEADER true), PostgreSQL discards the first line when loading data. However, if your CSV file doesn't contain any header, set this value to false using the HEADER false statement to load everything in the file.

After understanding the PostgreSQL \COPY syntax, follow the steps below to load and verify the CSV data.

  1. Run the following command to copy data from the '~/products.csv' file into the products table.

     xyz_db=> \COPY products (product_id, product_name, retail_price, available, created_on) FROM '~/products.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER true);

    Output.

     COPY 5
  2. Query the products table to ensure the data is in place.

     xyz_db=> SELECT
                  product_id
                  product_name,
                  retail_price,
                  available,
                  created_on
             FROM products;

    Output.

      product_name | retail_price | available | created_on
     --------------+--------------+-----------+------------
                 1 |         5.25 | Y         | 2022-12-12
                 2 |         30.2 | N         | 2022-12-10
                 3 |         14.4 | Y         | 2022-12-08
                 4 |        29.95 | Y         | 2022-12-10
                 5 |         3.45 | N         | 2022-12-12
     (5 rows)

The above output confirms that you've successfully loaded data from your CSV file into the managed PostgreSQL database cluster.

Conclusion

This guide illustrates the process of copying data from a CSV file into a managed PostgreSQL database cluster via Ubuntu 20.04 server. If you've several CSV files, repeat the above process to load all files into the respective tables.

Check out the following guides to learn more about Vultr's managed databases: