Feast is an open-source feature store that enables efficient management and serving of machine learning (ML) features for real-time applications. It provides a unified interface for storing, discovering, and accessing features, which are the individual measurable properties or characteristics of the data used for ML modeling. Feast follows a distributed architecture that consists of several components working together. These include the Feast Registry, Stream Processor, Batch Materialization Engine, and Stores.
Feast supports offline and online stores. While an offline store works with historical time-series feature values that are stored in data sources, Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize
command.
One of the supported online stores in Feast is Redis, which is an open-source, in-memory data structure store. This article explains how to use a Vultr Managed Database for Redis as an online feature store for Feast.
High latency can harm model performance and the overall user experience. One of the crucial factors in the success of a feature store is the ability to serve features at low latency. Using Redis as an online feature store attracts several advantages such as:
Elimination of the need for disk I/O operations that can introduce delays.
Features can be retrieved and served quickly, resulting in faster response times.
Machine learning models can offer efficient and timely predictions.
Data is stored directly in-memory instead of the on-disk saving server resources and improving the overall processing times.
To follow the instructions in this article, make sure you:
Deploy a Vultr Managed Database for Redis.
When deployed, copy your Vultr Managed Database for Redis instance connection information, and take note of the
host
,password
, andport
to establish a connection to the database.
Deploy a Ubuntu 22.04 Management server on Vultr.
Use SSH to access the server as a non-root sudo user.
Update the server packages.
To successfully connect to a Vultr Managed Database for Redis and install Feast, you need to set up Python, Redis CLI, and install the Feast SDK as described in this section.
Install Python 3.10
on the server.
$ sudo apt-get install python3.10
Install the Pip3
Python package manager.
$ sudo apt-get -y install python3-pip
Install the Redis CLI tool.
$ sudo apt-get install redis
Install the Feast SDK and CLI.
$ pip install feast
To use Redis as the online store, install the redis
dependency.
$ pip install 'feast[redis]'
Using Feast, bootstrap a new feature repository.
$ feast init feast_vultr_redis
Output:
Creating a new Feast repository in <full path to your directory>
Switch to the newly added directory.
$ cd feast_vultr_redis/feature_repo
Using a text editor such as Nano
, edit the feast_vultr_redis/feature_repo/feature_store.yaml
file.
$ nano feast_vultr_redis/feature_repo/feature_store.yaml
Add the following contents to the file. Replace VULTR_REDIS_HOST
, VULTR_REDIS_PORT
, and VULTR_REDIS_PASSWORD
with your actual database details.
project: feast_vultr_redis
registry: data/registry.db
provider: local
online_store:
type: redis
connection_string: "VULTR_REDIS_HOST:VULTR_REDIS_PORT,ssl=true,password=VULTR_REDIS_PASSWORD"
Save and close the file.
To register feature definitions, run the following command.
$ feast apply
The apply
command scans Python files in the current directory (example_repo.py
in this case) for feature view/entity definitions, registers the objects, and deploys infrastructure.
When successful, your output should look like the one below.
....
Created entity driver
Created feature view driver_hourly_stats_fresh
Created feature view driver_hourly_stats
Created on demand feature view transformed_conv_rate
Created on demand feature view transformed_conv_rate_fresh
Created feature service driver_activity_v1
Created feature service driver_activity_v3
Created feature service driver_activity_v2
Create a new file generate_training_data.py
.
$ nano `generate_training_data.py`
Add the following code to the file.
from datetime import datetime
import pandas as pd
from feast import FeatureStore
entity_df = pd.DataFrame.from_dict(
{
# entity's join key -> entity values
"driver_id": [1001, 1002, 1003],
# "event_timestamp" (reserved key) -> timestamps
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
],
# (optional) label name -> label values. Feast does not process these
"label_driver_reported_satisfaction": [1, 5, 3],
# values we're using for an on-demand transformation
"val_to_add": [1, 2, 3],
"val_to_add_2": [10, 20, 30],
}
)
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
"transformed_conv_rate:conv_rate_plus_val1",
"transformed_conv_rate:conv_rate_plus_val2",
],
).to_df()
print("----- Feature schema -----\n")
print(training_df.info())
print()
print("----- Example features -----\n")
print(training_df.head())
Save and close the file.
Generate training data.
$ python3 generate_training_data.py
Serialize the latest values of features to prepare for serving:
$ CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") &&\ feast materialize-incremental $CURRENT_TIME
When feature data is stored using Redis as the online store, Feast uses it as a two-level map with the help of Redis Hashes. The first level of the map contains the Feast project name and entity key. The entity key is composed of entity names and values. The second level key (in Redis terminology, this is the "field" in a Redis Hash) contains the feature table name and the feature name, and the Redis Hash value contains the feature value.
In a new terminal window, paste your Vultr Managed Database for Redis connection string to establish a connection to the database.
$ redis-cli -u rediss://default:[DATABASE_PASSWORD]@[DATABASE_HOST]:[DATABASE_PORT]
Replace DATABASE_PASSWORD
, DATABASE_HOST
, and DATABASE_PORT
with your actual Vultr Managed Database values.
When connected, your shell prompt changes to >
. Run the following command to view all stored keys.
keys "*"
Your output should look like the one below:
1) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
2) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xec\x03\x00\x00feast_vultr_redis"
3) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xeb\x03\x00\x00feast_vultr_redis"
4) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xe9\x03\x00\x00feast_vultr_redis"
5) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xea\x03\x00\x00feast_vultr_redis"
Check the Redis data type:
> type "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
Output:
hash
Verify the contents of the hash
.
> hgetall "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
Your output should look like the one below.
1) "_ts:driver_hourly_stats"
2) "\b\xd0\xa4\xb5\xa5\x06"
3) "a`\xe3\xda"
4) "5\xf20Q?"
5) "\xfa^X\xad"
6) "5\x83\x7f\xcb>"
At inference time, you can read the latest feature values for different drivers from the online feature store using get_online_features()
. In this section, fetch feature vectors for inference as described below.
Create a new fetch_feature_vectors.py
file.
$ nano `fetch_feature_vectors.py`
Add the following code to the file.
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
entity_rows=[
# {join_key: entity_value}
{"driver_id": 1004},
{"driver_id": 1005},
],
).to_dict()
pprint(feature_vector)
Save and close the file.
Fetch feature vectors, run:
$ python3 fetch_feature_vectors.py
Your output should look like the one below.
{
'acc_rate': [0.1056235060095787, 0.7656288146972656],
'avg_daily_trips': [521, 45],
'conv_rate': [0.24400927126407623, 0.48361605405807495],
'driver_id': [1004, 1005]
}
In this article, you used Feast for feature retrieval, and discovered why Redis is a good fit using a Vultr Managed Database for Redis as the online store. For more information about Feast, visit the official documentation.