so, what really is platform engineering?

Although in startups its often non-existent, most larger companies have internal developer “platforms”, a platform is an abstration over the underlying infrastructure of a system so developers don’t have to know or worry about intricate details of underlying processes that doesn’t concern them. A platform can be for many things, for getting a service online, monitoring, storing large volumes of data, developing machine learning models, serving such models etc. Software Engineering Infrastructure teams around the world work primarily on such platforms, to make things easier to develop and keep them reliable in production.

hard technical problems, and why they are hard

Platform Engineering is hard, because its often trying to abstract really complicated infrastructure. Abstracting infrastructure means platform teams are very often working on internal tools and processes that make it easy for product engineers/data scienctists to develop things without knowing the implementation details underneath, example of such a thing is - when a data scientist is trying to do some experiments in an internal notebook service, or they are trying to deploy an inference endpoint to production, they do not need to know how the machines are being provisioned, how they are scaled up and down, how they store data, how the endpoints are being monitored etc. as their primary concern is just getting their experiments done. There’s a lot of domain knowledge and hard (but fun) engineering involved in building such platforms. But, if I be honest, the engineering would be much easier to do if the engineers working on infrastructure could dictate how the platforms should work, but in real life its not so simple.

socio-technical nature of platform engineering

The reason why platform engineering is hard is because the platforms are often serving other developers with rather dramatic difference in domain knowledge, and its hard to understand the perspective of the end users for the platform engineers as the platform engineers themselves are often domain experts in infrastructure engineering, so its hard for them to evaluate if the platform they developed is easy enough to use for the end users. Any abstraction over the barebone infrastructure can seem like a good abstraction for a platform engineer, as they know very small details about the underlying systems, but the same perspective may not be true for the developers using the platform. Simple abstractions are easier to make, and if the target user of such platforms are other infra engineers, its often good enough. But if the end users are not familiar with the infrastructure, making the infrastructure sufficiently simple enough for the end user is a hard problem to solve. Often such abstractions seem good at first, but over-abstraction can cause the platform to not being able to evolve in future, as making changes in a highly abstracted system can be very very painful. Enforing the use of a platform is also hard, specially when the complexity of the workload being performed with the platform increases, users can grow increasingly frustrated to use the platform and often look for ways to bypass the platform to get their work done, when a platform becomes something that users try to find a workaround to avoid, the platform is no longer serving its purpose. Platform engineering and architecture decisions made at the system level is a hard technical problem due to constraints involved in building such platforms.

so what can be done?

I can’t tell you, because I myself am looking for the answer for that question, and I think its not possible to build the perfect platform. Sign of a good platform however, is a platform that can evolve with increasingly complex workloads and the one that was built with regular feedbacks from the end user. Its a problem that is equally social and technical.

things to checkout