Over the past five months, we’ve been working on developing a high-availability architecture using Kubernetes. One heck of a job we set on doing but rewarding nonetheless.
We’ve prepared for the new setup by firstly migrating our Presslabs dashboard on Kubernetes. The completion of the next step — the migration of all backends on Kubernetes — relied heavily on us working up a sweat and developing useful operators encompassing all the features that we needed.
Providing WordPress hosting, we know that 100% uptime is key. The operator that is essential to provide a high-availability architecture is the MySQL Kubernetes Operator — made by us. Its main role is creating the necessary resources for a MySQL Cluster.
Knowing Kubernetes comes with a suite of concepts to help you better grasp its meaning, and yet, there’s a time when you still fall short due to the sheer number of concepts. No worries, though, we’re here to shed a dim light on some of them.
Generally speaking, an operator behaves as an extension to Kubernetes’ know-how, directed on the functionalities you need it to perform — in our case, backups and a failover. Specifically, it’s an application controller which makes use of the Kubernetes API, in order to provide custom resource management. Moreover, the operator defines complex resources, on top of basic resources.
Like Kubernetes, the operators are usually written in Go, to benefit from the support and documentation available.
Let’s start with the cornerstone of this kind of concepts which bring meaning to web app deployment. To do it, you’ll need some resources like:
- a MySQL deployment, which creates the necessary pods
- a Service for the database — for routing purposes
- a Web deployment, which again creates pods, but specific to the application in question
- a Web Service — for selecting the desired pods within the app
- Ingress — in case you need external communication, as it sets a public hostname
We’ve mentioned pods a lot — but what are they? A pod is a group of containers, with shared storage, and the specification on how to run the said containers. It is a basic resource, which is used to define more complex resources.
The pods are created by a deployment or a stateful set. In the first case, all the pods are created at the same time, while in the second case, the pods receive a priority of creation — e.g. the second pod needs to get data from the first pod, in order to get initiated.
What’s cool about Kubernetes deployments, is that they manage stateless and stateful services, making it easier to use them in different contexts. The pods that are created in a deployment are stateless. Persistent volume claim is used to keep the state, in case the containers are turned off. Additionally, the Service is in charge of load balancing (routing).
As a failover and monitoring, Orchestrator — a Github product — does it best. It continuously checks server status, it verifies eligibility and provides a master status.
We wanted to give our 110% to create a high-availability and performant infrastructure. Although we aimed at using the best tools on the market today, sometimes those kinds of tools didn’t contain all the features we wanted or didn’t exist at all.
Initially, it didn’t cross our mind to develop an operator, we just wanted a deployment process. And because of that, we used the mighty Helm, a tool that streamlines installing and managing Kubernetes apps. We generated a chart with the necessary configurations (sort of a cookie cutter) and created deployments. Simply put, it wasn’t enough — we needed more resources and features.
To sum it up, we’ve encountered some drawbacks while working with Helm, like:
- In order to configure a new MySQL cluster, it was necessary to run helm with the config’s of the cluster
- We couldn’t register nodes in the orchestrator
- Backups couldn’t be listed — an overview or details of their creation were a no-show
- Likewise, clusters couldn’t be listed
This is why we decided to create our own MySQL Kubernetes operator, called the MySQL Operator. Fancy, right?
The operator makes use of the Orchestrator and uses Stateful Set to provide a performant infrastructure to set up MySQL servers.
- Resilient, thanks to Orchestrator, which does automatic failover
- Recurrent backups, stored in Cloud
- Easy deployment of a MySQL cluster
- High-availability MySQL
- Built-in monitoring. Here, Prometheus provides instance monitoring
On the whole, the operator works as follows: once a site is created on our platform, a MySQL Cluster is also created, and then, an instance of the site is created on Kubernetes — based on a WordPress chart (which is still in the works).
Creating a cluster involves generating a MySQL Cluster object, with custom resource definition. Next, the object is analyzed by the operator, and then, the operator creates the necessary resources for the cluster.
The MySQL Operator is a deployment that creates a pod — in which the controller runs. The controller starts some workers in order to handle events and create resources. Additionally, there is a Stateful Set with an Orchestrator — which is also deployed, and highly available by using its built-in distributed capabilities, thanks to raft. The Orchestrator chart is separated from the rest and can be found on our GitHub page.
Starting the deployment is easy-peasy — just type the following command, using Helm:
helm repo add presslabs https://presslabs.github.io/charts helm install presslabs/mysql-operator
The resources per cluster are as follows:
- Cluster secret — given as a user input. The cluster takes as an input a secret — provided by the user — with ROOT_PASSWORD. Moreover, the operator automatically adds credentials for an internal use.
apiVersion: v1 kind: Secret metadata: name: cluster-secret type: Opaque data: # root password is required to be specified ROOT_PASSWORD:
- Secrets for backups (optional) — given as a user input. Using as a method — backup bucket secret — which has the necessary credentials to connect to storage — to put backups, or download backups to init. It has as input fields for configuration, Backup Bucket Name, and Backup Bucket Secret Name — which works on Amazon S3, HTTP, and Google Storage.
apiVersion: v1 kind: Secret metadata: name: backup-secret type: Opaque data: # AWS_ACCESS_KEY_ID: ? # AWS_SECRET_KEY: ? # AWS_REGION: us-east-1 # AWS_ACL: ? # GCS_SERVICE_ACCOUNT_JSON_KEY: ? # GCS_PROJECT_ID: ? # GCS_OBJECT_ACL: ? # GCS_BUCKET_ACL: ? # GCS_LOCATION: ? # GCS_STORAGE_CLASS: MULTI_REGIONAL # HTTP_URL: ? # for more details check docker entrypoint: hack/docker/toolbox/docker-entrypoint.sh
- Stateful Set — creates pods
- Service — handles load-balancing
- Config maps — in order to configure the cluster
- Secrets — for an internal use, like replication of users and credentials
The cluster is defined by the following parameters:
- Specs — being the most important — and is composed of:
- Replicas — the number of pods
- Secret name — the name of the cluster credentials secret
- MYSQL Version — Percona Docker image tag
- Init Bucket URI — if it is set, once a cluster is created, it is initiated from that backup
- Backup Schedule — as an input takes a scheduler expression
- Backup URI
- Backup Secret Name
- MYSQL Config — which is a key value for fine tuning
- Pod Specs — which contain extra specs for pods
- Volume Specs — extra specs for volume, like how big should be the disk size
You can find an example below of a cluster definition. For more information, check our repository.
apiVersion: titanium.presslabs.net/v1alpha1 kind: MysqlCluster metadata: name: foo spec: replicas: 3 secretName: the-secret # mysqlVersion: 5.7 # initBucketURI: gs://bucket_name/backup.xtrabackup.gz # initBucketSecretName: ## For recurrent backups set backupSchedule with a cronjob expression # backupSchedule: # backupUri: s3://bucket_name/ # backupSecretName: ## Configs that will be added to my.cnf for cluster # mysqlConf: # innodb-buffer-size: 128M ## Specify additional pod specification # podSpec: # resources: # requests: # memory: 1G # cpu: 200m ## Specify additional volume specification # volumeSpec: # accessModes: [ "ReadWriteOnce" ] # resources: # requests: # storage: 1Gi
The features currently contained by the MySQL Operator are just the beginning. We plan to integrate SQLProxy, which will come in front of the Service, to be able to make queries to both master and slave servers.
Moreover, we’ll create out-of-the-box dumps, automated and on demand, and a point-in-time restore, so that when you wreak havoc to your site, you will be able to go back half an hour before starting the said havoc, and use that backup to restore your site to its initial brilliance. Not to mention, we’ll be adding a lot of more cool small stuff to the MySQL Operator. Don’t be afraid to contribute, too! We are always open to suggestions.
The MySQL Operator is part of a bigger plan — creating an open standard in scaling WordPress, by using Kubernetes. We’ll be presenting our vision at WordCamp Vienna today. If you’re around, come check out our presentation.
P.S. If you’re reading between the lines, you’ll see that this blog post is some sort of a love letter to Kubernetes. Well, we are awfully fond of Kubernetes, and we’re not afraid to admit it!