I was a speaker at KubeCon 2019, alongside Sachin Manpathak, technical lead at Platform 9. I held a presentation about our Kubernetes MySQL Operator, as well as the reasons that led to our decision to create our own operator.
Platform 9 were early-adopters of our operator which they now use in production. Sachin presented a case study on migrating from cloud-managed to Kubernetes-managed. I felt honored to have this joint presentation, as well as being able to present it at KubeCon, one of the biggest enterprise tech conferences.
The MySQL presentation I held was centered around 5 main topics:
- The context in which the operator was born
- The needs that we had in mind when we first started building
- What we have achieved, the operator overview
- Challenges that we encountered during development
- Project status and future plans
You can follow the live presentation here:
We are a managed WordPress hosting company in the market for more than 10 years. Presslabs started as a WordPress development agency, then we pivoted towards the hosting business.
After serving both publishers and Enterprise clients for several years, we came to realize that all companies in the global top Enterprise tier, including us, were doing the same thing. That’s when we started thinking about the Stack, an open-source infrastructure that could become the standard in WordPress hosting.
As part of our mission to democratize WordPress hosting infrastructure, we have 2 key objectives. The first is building an open infrastructure using Kubernetes to run and operate WordPress, that’s the Presslabs Stack. Our other objective is building the MySQL Operator, because half the WordPress hosting is about MySQL.
But why did we choose Kubernetes, you may ask. Kubernetes runs everywhere, from developers’ laptops to bare metal servers or public clouds. Our core services already ran on Kubernetes since version 1.7, and since we already experienced with containers and Docker, it made the switch to Kubernetes easier. Also, it already offers support for a lot of integrations, like certificate manager, NGINX (Ingress) and Prometheus (monitoring). Last but not least, it’s open-source, everyone can use it, as well as contribute to its development.
As part of the Stack, we needed a way to automate certain operations such as deploying, scaling, maintaining and backing-up MySQL.
For that, we’ve identified some key requirements the infrastructure should focus on:
- Ease of operations – we wanted something that is easy to operate, that doesn’t get in our way
- Elasticity – we needed an elastic service, to help us scale with the demand
- Service availability – in hosting, service-uptime is paramount, so we had to maximize the availability of our service
- Data safety – no one wants to lose data, especially when it comes to someone else’s data
- Observable – in order to reliably operate the service, we needed a method to observe what’s happening from the top and down to the request level.
Knowing exactly what’s needed, we checked some of the available solutions and concluded that they were not suitable for us. For example, both Oracle and Percona operators perform group replication — which implies that they required more nodes to operate, at least 3. This was not suitable for us because it greatly increased costs.
So, as all great engineers do, we ended up building our own solution — the Presslabs MySQL Operator, a Kubernetes Operator for managing MySQL Clusters with asynchronous or semi-synchronous replication.
Some must-have features that we’ve integrated, include:
- Self-healing clusters – the operator has to continuously reconcile and solve replication issues; without this feature, the operator doesn’t make sense
- Highly available reads – when more nodes are available
- Virtually highly available writes – that provides us minimum downtime due to fast failovers
- Replication lag detection and mitigation – takes lagging nodes out of rotation when lag is above a set threshold or in case of unhealthy nodes
- Resource abuse control – which is useful to limit noisy queries, that may slow down the cluster
- Automated backups and restores
#MySQL Operator Architecture
The architecture is split into 3 main parts: control plane, data plane, and monitoring.
The control plane consists of the operator and its components, which are deployed using helm, usually in a dedicated namespace. Here we have the controller itself and the Orchestrator, a MySQL high availability and replication management tool.
The data plane represents a MySQL deployment, made of basic Kubernetes resources (like pods, services, etc) which can be spread across multiple namespaces.
And last but not least, we have monitoring, which is performed by Prometheus, the standard Kubernetes monitoring system.
Going deeper into the data plane, you can see that the MySQL cluster has multiple components.
Statefulset represents the main resource, which provisions the pods and the PVs for each MySQL node. There are 2 services for each cluster: the Master service and the Healthy nodes service. The master service always points to the master MySQL node, while the Healthy nodes service points to all the pods that are considered healthy by the operator.
The selections are made based on Kubernetes labels, which are set by the operator based on information gathered from Orchestrator. Your application will interact with those two services for writes and for reads (and it’s the application’s responsibility to split them by using app-specific logic or by using some dedicated software like proxySQL).
Internally, a node consists of several components: init containers, a main container, and sidecar containers.
The init containers are used for MySQL initialization and configuration. The main container is the Percona Server for MySQL. We chose Percona because it’s battle-tested in enterprise environments and a MySQL drop-in replacement.
The sidecar containers are based on Percona toolkit and they are responsible for actions such as lag detection, MySQL monitoring and resource limit policy enforcement. There is also an extra container that provides an endpoint for node initialization or for backups.
But as with all great achievements, you get to hit into challenges along the way. And developing our MySQL operator was no exception.
It was a challenge to integrate Orchestrator, a third-party tool which we needed for handling MySQL topology and failovers, so we don’t have to reinvent the wheel.
We also had to manage the persistent volumes ourselves, because the way Kubernetes manages PVs is not suitable for MySQL. Furthermore, operator upgrades are a common problem for operators, since helm provides very modest CRDs support. MySQL upgrades are also a difficult operation especially when it comes to Kubernetes, because they are usually done by humans and they are difficult to automate.
Orchestrator is a subcomponent of the entire operator, it’s a MySQL high availability and replication management tool. However, a big downside is that it’s not meant to be stateless, as operators usually are.
Both Kubernetes and the Orchestrator keep a state and the operator doesn’t know which one to listen to, which leads to an information flow conflict. To fix this, we chose to implement a reconciliation loop between Orchestrator and Kubernetes which reconciles the state between the two, at every few seconds.
On the one hand, the Orchestrator is responsible for updating replication topology in emergency situations and to observe the current status of the MySQL cluster. On the other hand, the Operator reconciles the desired replication topology into Orchestrator and provides service discovery. Even if the Orchestrator data is lost, the operator is able to restore all the data to the Orchestrator.
As a conclusion, the operator has to take decisions based only on the information found in Kubernetes which is up-to-date, thanks to the reconciliation loop.
Another challenge was how Kubernetes manages Persistent Volumes or PVs. The MySQL data is being stored in PVs managed by the statefulset. But this implies that when a cluster is scaled down, the volume is not deleted which means the data could become obsolete after a while. When the statefulset is scaled up again, the replication can fail.
To fix this we implemented a cleaner that deletes the Persistent Volume Claim (PVC) when the cluster is scaled down, except for node 0 which is a special case and the data should be kept as long as the cluster exists, to avoid losing cluster data.
#Operator upgrades / deployment
A common problem in the world of operators is CRD management. Currently, the defacto standard for packaging applications is helm. If you are a helm user, you probably know that CRD management is still very painful, because helm does not provide an upgrade path for CRDs.
More to it, the MySQL Operator is still in development and CRDs specifications are still subject to change. This made us install CRDs without validation to minimize user intervention at upgrades. However, we hope this to be only a temporary solution until Helm improves its support for managing CRDs.
A specific challenge for this operator is how MySQL upgrades are performed. Kubernetes already provides upgrade policies like rolling updates (update policy). But that’s not exactly gentle with MySQL as it can choose to upgrade the master first, which then forces a failover to the replica. And when the replica is updated, it triggers another failover, which is unnecessary and can be avoided if the master is the last one to be updated. Conclusion: master should be the last one standing to avoid failover flip-flop or downtime.
A contributor came up with an idea to use On Delete policy, which better fits our needs since the operator can choose which pod to update. In this way, we can control the order in which the pods are upgraded.
We tried to use other techniques as well, like pod finalizers, to block pod deletion until the failover is done. However, we hit a dead end because we misunderstood how Kubernetes finalizers work. Also, using containers lifecycle hooks to trigger a failover was proven to be too complicated. So we chose to implement ‘On Delete’ policy which is still work-in-progress.
The Operator is still in alpha version, and we’re really close to beta. We have great feedback from the community and some major platforms, including Platform 9, Heureka, Agri Terra or Kinvolk, actively use and contribute to it.
We would like to invite you to visit the project page on Github and for any questions to join the #mysql-operator slack channel.
We’ve also written detailed tutorials on how to set up a MySQL cluster using the Presslabs MySQL Kubernetes Operator on Google, AWS, Azure, and DigitalOcean, be sure to check them out.
We plan to integrate the MySQL Operator with various Marketplaces, including Google Cloud Marketplace, OperatorHuba, and AWS Marketplace, to make it easier for end users to install it.
We also plan to add CRD validation and webhooks, and multiple backup policies for granular control over backups.
To make it easy for your application to connect to the cluster, we will integrate ProxySQL. Instead of using 2 services (the Master service and the Healthy nodes service), the app can connect only to ProxySQL which will do the routing for you.