Until recently our internal services have been deployed using Ansible on bare metal servers. For a long while, we wanted to move our infrastructure to the cloud to make deployments faster and easier. But we also wanted to migrate all our actual infrastructure to Kubernetes and that wasn’t an easy task. So we started looking into the Google Cloud Platform learning more about it and how it can help us.
Today we can say that Google Cloud has enabled major progress, allowing us to grow from deploying once a week to as many as 10 deployments per day, with zero downtime. Not only that, we also managed to automate deployments, resulting in a major reduction in operational efforts.
The big change, however, was how all these have shaped our vision on how the future of modern WordPress hosting should look. That change has a name – the Stack. A prototype
Tipping pointTipping point
Presslabs’ migration to Google Cloud Platform could have come into effect much later if it wasn’t for the urge to find a truly reliable, highly available solution to provide a healthy scale-up for our dashboard services and our enterprise customers.
Initially, we considered using our own cluster. Despite the enthusiasm of building the solution tailored to our specific needs, we refused to comply with the prolonged setup time and the strenuousness configuration process. It takes days to order, install and provision the machine, whereas we installed the GCP services in just five minutes. In the same way, decommissioning physical machines takes time and needs to be done in a certain time frame for monthly commitment; same for cleaning up the content of the storage. Google Cloud Platform, on the other hand, gives us the flexibility to mount on virtual machines as needed. Finally, the maintenance costs and short-lived nature of physical servers on which a Kubernetes cluster would be installed and maintained, as well as the need for a highly qualified engineer to perform these actions instead of focusing on building our Presslabs platform did not turn this alternative into a choice.
Using a Kubernetes-friendly solution was another critical aspect which has contributed to our preference for GCP in our pursuit of high availability. The previous environment barely supported one deployment per week. It was also easier to make the integration tests as the configuration of the entire system is easier to replicate—we deploy a clone of the system to test various components. We now perform up to 10 deployments, daily.
Current solutionCurrent solution
Our managed WordPress hosting platform is based on several Google services which have significantly simplified and accelerated the development, management, and deployment of our services.
Our platform is now comprised of three pods for each service; if one node fails, there are other two to cover it. This configuration has ensured the high availability we were looking for, as we have experienced zero downtime in the past 10 months since using the Google Cloud Platform.
Presslabs uses the Google Kubernetes Engine for deployment of microservices used both for the client side and the SaaS management processes. We use GKW in the user-facing part of our product, mainly the dashboard (Oxygen), but also in the billing system (Silver), DNS management (Zinc) and customer code repo (Gitea). On the SaaS side, we use it for server fleet management (Lattice), edge infrastructure cache management (Carbon-Cache), internal error reporting (Sentry), MySQL Operator, CI tool (Drone).
Official documents, such as customer contracts and invoices are securely hosted on Google Cloud Storage.
Another key component is the Google Key Management System for password management, credentials generation, and encryptions, as well as deployment secrets (helm and sops).
The benefits of using Google Cloud can be crystallized in 3 major ideas: automation, scaling and cost optimization.
When we migrated the system, the system in its entirety did not undergo a rewrite, only the deployment and configuration parts underwent significant changes over the course of four months. As a general outcome, the system has become more stable, highly available, easier to maintain, update, upgrade and, on top of all, for a similar price tag.
For application development, we use Django as API back-end and Celery for handling asynchronous tasks. Previously, we used to deploy our dedicated application which performed the client operations and management on a single machine. Running on a single machine limits the power of Celery (distributed applications) which allows running on multiple workers, on multiple machines. In other words, using only one machine made it almost impossible to scale the application.
Using this design pattern (Django and Celery) facilitated our migration to Google Cloud, also helping us to easily scale the Celery workers on multiple machines.
Presslabs Platform Specifications
Curious to know more about how a high performance WordPress infrastructure looks like? Get a birdseye view on how Presslabs is built.
Kubernetes allows us to scale up the cache management service. Once we add new edge nodes, the number of cache requests grows very much and we need a dynamic way to scale; we couldn’t do this with a physical solution. We currently have 20K cache management tasks per hour, as our app is build using Celery that executes cache related tasks and each celery worker is deployed in a Kubernetes Pod. Scaling the app is simple—in implies increasing the number of worker Pods. This is a distributed application pattern which allows us to scale easily at the ease of a button. Kubernetes keeps us safe from our own errors because it performs rolling updates.
GKE has automated nod updates, which implies that we don’t have to patch machines anymore. Migrating to Kubernetes helps us use preemptible nodes which are 4 times cheaper than regular nodes; yet, we had to find a more resilient method. Currently, we rely on preemptible nodes for all our services, except for the DBs which have their peculiarities. Furthermore, preemptible nodes allow us to design taking into consideration a chaos monkey that permanently works on the infrastructure, thus forcing us to design embracing systems failure and recovery.
Our Presslabs CTO, Calin Don, shares his point of view: “We’re very satisfied with the Google Cloud Platform experience and we will not stop here. We’re currently trying to bring these capabilities in the client zone, which implies a complete rewrite of our hosting platform based on several Kubernetes operators.”
Generally speaking, at a customer management level, we perform 100K tasks daily, without counting the content management related tasks. We were surprised to see it allowed us to scale up to tens of millions of validation requests per day, and, more than that, to manage thousands of DNS domains and to successfully serve thousands of millions of pageviews per month.
To top it off, we can easily scale up to as much as needed; and nothing compares to the freedom of granularly scaling it to prevent over provisioning and exceeding costs.