Presslabs is a managed WordPress hosting platform which brings together highly available hosting and WordPress Intelligence for globally reputed publishers, agencies and Enterprise companies. Founded in 2011, Presslabs is one of the top tier WordPress-centric hosting products for Enterprise.
Until recently, our internal services have been deployed using Ansible on bare metal servers. We knew for sure that we wanted to move our infrastructure to the Cloud to make deployments faster and easier. Google Cloud has enabled us major progress in this direction, allowing us to grow from deploying once a week to as many as 10 deployments per day, with zero downtime. Additionally, we have managed to automate deployments, resulting in a major reduction in operational efforts.
Externally, we wanted to migrate on Kubernetes since the very beginning of it; we started testing it almost two years ago. Our intention was to migrate all our actual infrastructure to Kubernetes. In order to do this, we have evaluated the Google Cloud platform, which has generated two major changes: the migration of existing services and the Presslabs Stack. The first one is successfully concluded, whereas the latter is still in progress.
Presslabs’ migration to Google Cloud Platform could have come into effect much later if it wasn’t for the urge to find a truly reliable, highly available solution to provide a healthy scale-up for our dashboard services.
Initially, we have considered using our own cluster. Despite the enthusiasm of building the solution tailored to our specific needs, we refused to comply with the prolonged setup time and the strenuousness configuration process—it takes some days to order, install the machine and provision it, whereas we have installed the GCP services in just five minutes. In the same way, decommissioning physical machines takes time and needs to be done in a certain time frame for monthly commitment, as well as cleaning up the content of the storage, whereas GCP gives us the flexibility to mount on virtual machines as needed. Finally, the maintenance costs and short-lived nature of physical servers on which a Kubernetes cluster would be installed and maintained, as well as the need for a highly qualified engineer to do perform these actions instead of focusing on building our Presslabs platform did not turn this alternative into a choice.
Using a Kubernetes-friendly solution was another critical aspect which has contributed to our preference for GCP in our pursuit of high availability. The previous environment barely supported one deployment per week. It was also easier to make the integration tests as the configuration of the entire system is easier to replicate—we deploy a clone of the system to test various components. We now perform up to 10 deployments, daily.
Our managed WordPress hosting platform is based on several Google services which have significantly simplified and accelerated the development, management, and deployment of our services.
Our platform is now comprised of three pods for each service; if one node fails, there are other two to cover it. This configuration has ensured the high availability we were looking for, as we have experienced zero downtime in the past 10 months since using the Google Cloud Platform.
Presslabs uses the Google Kubernetes Engine for deployment of microservices used both for the client side and the SaaS management processes. We use GKW in the user-facing part of our product, mainly the dashboard (Oxygen), but also in the billing system (Silver), DNS management (Zinc) and customer code repo (Gitea). On the SaaS side, we use it for server fleet management (Lattice), edge infrastructure cache management (Carbon-Cache), internal error reporting (Sentry), MySQL Operator, CI tool (Drone).
Official documents, such as customer contracts and invoices are securely hosted on Google Cloud Storage.
Another key component is the Google Key Management System for password management, credentials generation, and encryptions, as well as deployment secrets (helm and sops).
The benefits of using Google Cloud can be crystallized in 3 major ideas: automation, scaling and cost optimization.
When we migrated the system, the system in its entirety did not undergo a rewrite, only the deployment and configuration parts underwent significant changes over the course of four months. As a general outcome, the system has become more stable, highly available, easier to maintain, update, upgrade and, on top of all, for a similar price tag.
For application development, we use Django as API back-end and Celery for handling asynchronous tasks. Previously, we used to deploy our dedicated application which performed the client operations and management on a single machine. Running on a single machine limits the power of Celery (distributed applications) which allows running on multiple workers, on multiple machines. In other words, using only one machine made it almost impossible to scale the application.
Using this design pattern (Django and Celery) facilitated our migration to Google Cloud, also helping us to easily scale the Celery workers on multiple machines.
Kubernetes allows us to scale up the cache management service. Once we add new edge nodes, the number of cache requests grows very much and we need a dynamic way to scale; we couldn’t do this with a physical solution. We currently have 20K cache management tasks per hour, as our app is build using Celery that executes cache related tasks and each celery worker is deployed in a Kubernetes Pod. Scaling the app is simple—in implies increasing the number of worker Pods. This is a distributed application pattern which allows us to scale easily at the ease of a button. Kubernetes keeps us safe from our own errors because it performs rolling updates.
GKE has automated nod updates, which implies that we don’t have to patch machines anymore. Migrating to Kubernetes helps us use preemptible nodes which are 4 times cheaper than regular nodes; yet, we had to find a more resilient method. Currently, we rely on preemptible nodes for all our services, except for the DBs which have their peculiarities. Furthermore, preemptible nodes allow us to design taking into consideration a chaos monkey that permanently works on the infrastructure, thus forcing us to design embracing systems failure and recovery.
Our Presslabs CTO, Calin Don, shares his point of view: “We’re very satisfied with the Google Cloud Platform experience and we will not stop here. We’re currently trying to bring these capabilities in the client zone, which implies a complete rewrite of our hosting platform based on several Kubernetes operators.”
Generally speaking, at a customer management level, we perform 100K tasks daily, without counting the content management related tasks. We were surprised to see it allowed us to scale up to tens of millions of validation requests per day, and, more than that, to manage thousands of DNS domains and to successfully serve thousands of millions of pageviews per month.
To top it off, we can easily scale up to as much as needed; and nothing compares to the freedom of granularly scaling it to prevent over provisioning and exceeding costs.