For a while now, I’ve been curious about what impacts running a service mesh will have on the performance of my software. I’ve spoken with vendors who have estimated around a 10% performance hit, but I wanted to see for myself the actual impacts. Actually, it’s not just to satisfy my own curiosity, I’ve also had customers ask what the runtime penalty is, and would I recommend they run a service mesh. Some of our customers run real-time systems and so this is an important question to be able to answer for them.
Istio provides a lot of value for microservice architectures, providing features such as security, throttling, and telemetry. That is a lot of goodness, but nothing comes for free. I wanted to find out if running a service mesh is worth the performance cost and just what those costs might be. As will be seen in the details below, there are measurable performance impacts running an Istio service mesh in your Kubernetes cluster.
First, let’s cover the test setup. In a previous article, I had created a tiny service mesh with two microservices: capitol-client and capitol-info. Capitol-client is a microservice that enables users to query for country capitols, enter new capitols, or update existing ones. The microservice runs a website and API gateway written in AngularJS and hosted on NGINX. The second microservice, capitol-info, provides CRUD capabilities via REST operations: the first returning capitols via an HTTP GET call; and the second inserting/updating capitols via an HTTP POST. This microservice is written in Java and runs with Spring Boot. As shown in the diagram below, the capitol-client microservice calls the back end capitol-info microservice, and it is this microservice architecture that will be used as the test framework. Finally, in order to generate load on the service mesh, I use Postman to call the capitol-client API gateway.
In order to quantify the performance impacts, I ran two sets of tests. The first test set ran against Docker Desktop with Kubernetes (aka K8s), hosted on my local machine. The second test used my local machine to generate the calls to a service mesh running on Red Hat OpenShift 4.4, hosted in AWS.
For each test I conducted three runs of 1000 batched calls to the API gateway in capitol-client…