Vertica DB performance test with locust.io

Dima Statz
6 min readFeb 3, 2020

“Define user behavior with Python code, and swarm your system with millions of simultaneous users” — locust.io

Introduction

Vertica is a very powerful analytic database designed to handle massive amounts of data. When configured right, Vertica enables a great query performance even in very intensive scenarios. Some Vertica users report that their Vertica cluster handles thousands of concurrent users running sub-second queries. Designing and configuring a database with such a great performance requires a lot of knowledge and experience and it is out of the scope of this article. But once the cluster is designed, configured and loaded with data, another step is to test the performance of the database before going to production. And here, I would like to show how locust.io can be used to simulate thousands of virtual users, just on a single machine and perform load testing on Vertica DB.

Database load testing

Apache Jmeter is widely used for database load testing. And it makes a lot of sense — JMeter has a built-in JDBC connector that allows connecting to many existing databases, it has a ‘Test Plan’ feature where you can load all your SQL queries and run it by distributing the test across different machines. JMeter is just perfect for simple test scenarios that can be created by using Jmeter GUI mode. Even a non-experienced engineer can read and write basic scenarios without any trouble. However, if you need something more complex, and you need to write the code, you will quickly realize the complexity of scripts implementation in JMeter. The reason for such a complex scripting model is that JMeter was designed to be used mainly with GUI mode.

And here where Locust.io has a clear advantage. Locust is all about coding. That's true that you must have a Python coding experience to feel comfortable with performance tests creation, on the other hand, having all your tests in code is a big advantage. You can manage all your tests in source control, share with your team, you can easily add, remove, fix any test and you can automatically deploy it to any environment. It also much faster to develop and change script than opening a GUI application and committing the required changes via the console.

Locust.io

Locust was built mainly for HTTP web-based testing. The easiest way to start with locust.io just run the install

pip install locustio

Now create locustfile.py python file with the following content

from locust import HttpLocust, TaskSet, task, between

class WebsiteTasks(TaskSet):
def on_start(self):
self.client.post("/login", {
"username": "test_user",
"password": ""
})

@task
def index(self):
self.client.get("/")

@task
def about(self):
self.client.get("/about/")

class WebsiteUser(HttpLocust):
task_set = WebsiteTasks
wait_time = between(5, 15)

And you are ready to. Run the following command

locust -f locustfile.py

You will see the following output on your terminal

Now open your browser and navigate to localhost:8089 and you will see the following web page

And that’s all, you are ready to test ‘login’ and ‘about’ URLs of any HTTP API. As you can see locust.io has support for HTTP testing just out of the box. And now let’s see how easy the development of Vertica support is.

Locust.io and Vertica plugin

Let’s start by installing the official native Python client for the Vertica Analytics Database.

pip install vertica-python

Now we are ready to start implementation. Create a python file — test_vertica.py. A locustfile is a normal python file that declares at least one class inheriting from the Locust class. Let’s start with VerticaLocust class

class VerticaLocust(Locust):
def __init__(self, *args, **kwargs):
pass

A locust class represents one user. Locust will spawn one instance of the locust class for each user that is being simulated. There are a few attributes that a locust class should typically define. The task_set attribute should point to a TaskSet class that defines the behavior of the user. In addition to the task_set attribute, one should also declare a wait_time method. It’s used to determine how long a simulated user will wait between executing tasks. Another attribute that you will see a lot in Locust implementations is the host. The host attribute is a URL to the host that is to be loaded. Since we are working with Vertica, you can skip it.

class VerticaTaskSet(TaskSet):
@task
def execute_query(self):
pass


class VerticaLocust(Locust):
task_set = VerticaTaskSet
wait_time = between(0.1, 1)

def __init__(self):
super(VerticaLocust, self).__init__()
self.client = VerticaClient()

So, if the Locust class represents a swarming locust, the TaskSet class represents the brain of the locust. Here we have to add an implementation of the Vertica query itself

def get_sample_query():
query = '''
SELECT COUNT(*) FROM tst.test_table'''

conn = {
'host': os.environ['vertica_host'],
'port': os.environ['vertica_port'],
'database': os.environ['vertica_database'],
'user': os.environ['vertica_user'],
'password': os.environ['vertica_password'],
'read_timeout': 600,
'unicode_error': 'strict',
'ssl': False
}
return conn, query
def execute_query(conn_info, query):
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
cur.execute(query)
return [x for x in cur.iterate()]


class VerticaClient:
def __getattr__(self, name):
def wrapper(*args, **kwargs):
start_time = time.time()
try:
res = execute_query(*args, **kwargs)
events.request_success.fire(request_type="vertica",
name=name,
response_time=int((time.time() - start_time) * 1000),
response_length=len(res))
except Exception as e:
events.request_failure.fire(request_type="vertica",
name=name,
response_time=int((time.time() - start_time) * 1000),
exception=e)

logging.info('error {}'.format(e))
return wrapper


class VerticaTaskSet(TaskSet):
@task
def execute_query(self):
self.client.execute_query(get_sample_query()[0], get_sample_query()[1])


class VerticaLocust(Locust):
task_set = VerticaTaskSet
wait_time = between(0.1, 1)

def __init__(self):
super(VerticaLocust, self).__init__()
self.client = VerticaClient()

So, the VericaClient class contains a wrapper method that is responsible for running the query and measuring performance, failure/success rate, etc. The method execute_query contains the Vertica specific implementation: creating a connection, running the query and parsing the query results. All Vertica’s required setting like a user, password, the database name will be fetched from env variables. Of course, you can replace it with a config file, secret storage, etc.

Load testing results

Now we are ready to start the performance test. Run the following command:

locust -f test_vertica.py

Open a browser of your choice and navigate to http://127.0.0.1:8089/. Enter the number of users to simulate and the hatch rate. Leave the Host field empty and hit the ‘start swarming’ button. Go to the ‘charts’ tab and you will see RPS, Number of users and the Response time charts

Also, you can see the detailed report of sent requests on the ‘statistics’ tab.

You can also see all exceptions, failures and request data in corresponding tabs. By using Locust reports and Vertica system metrics you will get a clear picture of how your DB performs for this specific query.

Conclusion

Using and extending Locust.io is extremely easy. It is a matter of hours to implement a new adapter for locust.io. It is a matter of minutes to create a load testing scenario. If you want to maintain your load test as code, you want to share it with your team and to deploy it during CI/CD to the test environment, Locust.io is definitely the framework to use.

--

--