SaaS Control Planes 2.0 #3 How not to build your Infra SaaS Control Planes - Part 1
Disclaimer: The following reflects the personal opinions of the author and do not represent the views of any contracted clients. A subjective view.
In my tenure, I've been lucky to the work with multiple infra SaaS startups, engaging with multiple of them from their inception. The control plane, pivotal in deploying SaaS applications to a Dataplane, serves as the linchpin of this journey. As startups progress from pre-seed to seed to series A, the dynamic landscape of customer demands and the necessity to accommodate multiple cloud providers necessitate continuous evolution. This evolution breeds iterations—v1, v2, and beyond—reshaping backend architectures.
Engaging with potential customers for my product—a centralised control plane catering to shared, dedicated, and BYOC environments—has unveiled a spectrum of design philosophies. Notably, not all prioritise the ethical integrity of their designs—a notion inherently subjective. Our backgrounds and past experiences, as engineers, heavily influence our problem-solving methodologies and solution architectures.
Personally entrenched in infrastructure development, I often liken our role to that of road builders amidst a fascination with sleek automobiles. While cars garner admiration, it's the roads that underpin seamless journeys. Similarly, infrastructure developers pave the way, prioritising the application's essence while constructing the necessary framework. And yes, its not just terraform or cloud, it requires actual understanding of core concepts of distributed systems. I have dealt with building controllers, reconcilers, vm orchestrator , state machines, chaos controllers etc. Prior to k8s i worked on bosh, cloud foundry and now k8s. Kubernetes for someone like me is not just an orchestration platform, its a control plane.
While organisations typically start by building SaaS on shared infrastructure, they eventually need to scale up and support various customer requirements for deploying software. Having a hybrid/private SaaS offering is critical for attracting larger customers. Organisations often end up rearchitecting their SaaS architecture, building three different architectures to support Shared, Dedicated and Private SaaS.
Siloed - No Abstraction between Product, Metadata and Infrastructure Layers.
The most naive approach is to build a single backend API which embeds product knowledge, writes to a metadata DB and then triggers provisioning of software.
By Product i specifically mean the teams who decide on what is the user flow for onboarding, terminologies used, the UI/UX flows etc.
Ex: Product teams decide on various flows such as
What is an organisation ? How to onboard an org.
The org belongs to which tier or plan. Defining Plans.
Can an org create multiple workspaces based on his plan ?
How can org admin add users to a workspaces ?
and many more.
Problems
Lack of Backend API Consistency:
As product teams introduce new plans or tiers, the backend API undergoes frequent changes, often tightly coupled with specific product knowledge.
Absence of Business Plans-to-Infrastructure Mapping:
The absence of a clear mapping between product definitions and infrastructure provisioning leads to uncertainty regarding whether different plans utilise the same underlying infrastructure.
State Management Deficiency:
The lack of a coherent state management strategy results in backend APIs resorting to polling Kubernetes clusters for state information, leading to scalability issues and potential architectural disarray.
Scalability Challenges for Different SaaS Models:
While the current architecture may suffice for a Shared SaaS model, accommodating customer demands for dedicated clusters poses significant challenges, necessitating a robust solution.
Integration Complexity with Cloud Providers:
With the expansion of services to different cloud environments, there arises a need to integrate backend APIs with cloud platforms for automated cluster creation and infrastructure provisioning.
Addressing Customer Data Privacy Concerns:
Deploying software within a customer's network poses unique challenges, particularly concerning data privacy and security, necessitating careful consideration and possibly tailored deployment strategies.
Solution
The provisioning of services should remain consistent and independent of the evolving nature of products.
Based on the above statement, design your control plane.
Infra SaaS Architecture Layer
Application Layer:
The application layer encompasses the UI/UX design, onboarding workflows, and user management aspects. This layer focuses on delivering a seamless user experience, tailored to the unique requirements of each application. Workflows can define mapping of a single organisation to a workspace or multiple workspaces.
Data Integration Layer:
The data integration layer handles tasks such as data persistence to relational databases, authentication mechanisms, and data-related operations. It ensures efficient data management and integration and acts as a middleware.
Service Provisioning Layer:
The service provisioning layer is responsible for managing the lifecycle of applications deployed to data planes and in most cases it is also responsible for the lifecycle of data planes. It involves tasks like application deployment, scaling, monitoring, and health checks.
As we delve deeper we are focused on the service provisioning layer. The service provisioning layer should always be decoupled from the App and DI layers. This layer can be ported to any cloud on any network.
Abstractions , Separation of Concerns and State management.
In this architecture, we divide responsibilities clearly. The Backend API manages metadata exclusively, while the underlying service provisioning layer comprises several controllers based on state machines. These controllers continuously take actions and reconcile states. This setup operates on a push-based model, where the service provisioning layer (control plane) pushes configurations to Dataplanes. Push-based models are suitable when Dataplanes are within your network. In our next post, we'll delve deeper into the comparison between push and pull-based models for infrastructure SaaS.




