Under performing system issues; using cloud platform metric services.

The idea behind this article is to describe what caused us to consider adding an automated system which is capable of detecting bad server performance.

What happened?

The problem occurred after app deployment (as they often do). The combination of of having a slow (under performing) database query and the way the data was pulled from the server (ajax polling)  should naturally lead you to check provided cloud platform metric tools.

This does not guarantee that all similar issues will not reoccur, since it is not necessarily apparent during development/testing phases.

Why is it not detected before deployment?

1. You can't really simulate real life usage of the application, this means that is challenging to cover every step. Your team normally consists of 1-4 concurrent testers clicking around the app to see if everything behaves as it should.

2. Quite often developer/testers will be unaware of the issues that cause responses to be between 150-250 ms.

How to prevent performance issues on production?

"Think about having automated testing of server's performance."

Tools used:

  1. JMeter
  2. PostgreSQL
  3. access.log

JMeter is used to fetch data from access.log (a list of user requests) and to configure any number of concurrent users (which is limited by your machine).

Steps taken:

  1. Use JMeter to test endpoints and observe responses
  2. Find a problematic response and jump into a deeper analysis (in our case a query was causing the issue)
  3. Use PostgreSQL to analyze the performance of all queries that have been observed to be slower than expected.
  4. pg_stat_statements provides a means for tracking planning and execution statistics of all SQL statements executed by a server.
F.30. pg_stat_statements
F.30. pg_stat_statements F.30.1. The pg_stat_statements View F.30.2. The pg_stat_statements_info View F.30.3. Functions F.30.4. Configuration Parameters F.30.5. Sample Output F.30.6. Authors The …

The tricky bit is to understand why is the query so slow (according to query plan).

There are 2 types of tests we conduct:

  1. Access-log-load-test; to simulate real user activity
  2. Database-load-test; carefully analyze data from the access-log-load-test to simulate queries of those end

Conclusion

Using these types of test helps startup apps investors to have a better understanding of server costs. Once these tests are automated, it will not add up much time in development hours but it will certainly help to reduce unexpected costs and enable startups to scale.