The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
The Delivery Group enables GitLab Engineering to deliver features in a safe, scalable, and efficient fashion to both GitLab.com and self-managed customers. The Group deploys changes to GitLab.com continuously throughout the day and also ensures that GitLab's monthly, security, and patch releases are made available to our SaaS Platforms and self-managed customers on a predictable schedule.
Our 3-year Delivery strategy contributes to the company strategy primarily by maturing the platform. In order to meet our planned growth(internal) GitLab is evolving its architecture toward Cells. To support this, Delivery must scale rollout strategies to provide safe and efficient rollouts to a fleet of potentially thousands of Cells in the future. With deep expertise in the application, its components, and how to deploy GitLab at a scale that serves millions, the Delivery Group is uniquely positioned to become the orchestrator of the fleet, ensuring that we can roll out changes safely at scale.
To ensure that the evolution to Cells is successful, technical as well as cultural changes will be needed. For example, to enable gradual rollout changes across an entire fleet will require much stronger backward and forward compatibility than is currently required for GitLab.com.
To facilitate this longer-term organizational change, Delivery has changed how it thinks about deployments and releases and now considers GitLab.com to be a “fleet of one”. This change allows us to focus on learning about how best to apply our tools and processes to effectively coordinate rollouts across a fleet of Cells.
For deployments to be efficient across a fleet of Cells, our tools and processes must be fully automated (automatic rollout & rollback, deployment observability, self-healing, scalable). To achieve this goal in an iterative manner, we will continue to learn as much as we can about operating effective rollouts on a fleet of Cells at the same time as operating effective continuous delivery to GitLab.com.
Over the course of the next 3 years, we expect to see strong strategic partnerships evolve across the Platforms Section to drive prioritization, swarm to solve the most important problem, and clear roadblocks for other groups in the section. The Dedicated Group has solved a considerable amount of problems encountered with operating a single tenant of GitLab in an automated way. Additionally, the Scalability Group supports our SaaS platforms and has expertise in the kind of monitoring and logging that will be required to run Cells.
Finally, we will require assistance from other groups within the company where their domain expertise is invaluable as part of moving toward the next iteration of deployments. For example, the tests that we run may be prohibitively expensive to horizontally scale and health checks may be a better fit. As the DRI for our test coverage, the Quality Department are positioned to help us navigate this. Another example is the Database Group which can help us with the longer-term direction of post-deployment migrations to unblock rollbacks for all users.
Since GitLab is public by default and operates both SaaS platforms as well as a self-managed offering, there are some unique challenges as part of day to day operations. GitLab's business model is based around an open core and we believe in maintaining transparency over the source code that is part of GitLab, even where this introduces additional challenges.
However, as a typical part of business for a software company in 2023, we are required to implement security fixes and remediations on a regular basis. In order to keep both ourselves and our customers safe and secure, we must discuss and implement these security fixes whilst maintaining confidentiality until the fixed release is made available to all of our customers. As a result, we have two streams of code that flow into GitLab and divergence can add complexity.
GitLab is made up of a series of components that have a tight logical coupling and strong forward/backward compatibility requirements. Our GitLab.com deployments and managed versioning release processes span many parts of the organization and are reflective of our organization structure, see Conway’s Law. This can make the process highly resistant to change, as there is a significant organizational burden to coordinate and align on the changes that need to be made across a process, where each department has their own metrics and responsibilities. Visibility across processes is low and it has prevented Delivery from truly evolving the process as opposed to iterating on sections over time.
Delivery models the complex subsystem team pattern and is responsible for ensuring that GitLab is delivered to customers, without outages, multiple times a day. The team has deep expertise in the architecture, deployment patterns, and hands-on remediations involved in deploying and releasing GitLab. Onboarding, operating, and maintaining these systems have a high cognitive and operational load. As a consequence, project work that could evolve our deployment capabilities and unlock new business opportunities can be deprioritized over “keep the lights on work” which prevents/mitigates user impact.
Over FY24 we've identified the following 4 key themes and aligned them with the Infrastructure & Quality department’s direction.
Release management currently operates through a combination of manual and automated steps. As GitLab grows and our feature velocity increases, we need to evolve the current process to handle the increased demand for scale. Our current releases processes are unlikely to scale much further and we risk being a bottleneck to throughput if we do not invest in evolving and streamlining the release and deployment tools & processes. This should also contribute to the infrastructure goal of Achieve 50% growth year-on-year in engagement surveys results compared to FY23.
Our specific goal is to reduce stress and cognitive load by measuring and reducing Release Manager workload and as a result increase the amount of time we can spending improving our tools and removing tech debt. We'll achieve that by:
Our release processes are complex, involve a lot of manual touch points and currently require deep domain expertise to execute. As a result the Delivery group can become a bottleneck or a gate for many teams wanting to make changes to their deployments, fix bugs and support older versions. Additionally, things like backports and major dependency upgrades often represent sudden and unplanned work which have to be prioritized against the current needs. As a result, non-critical fixes and upgrades can be rejected and the benefit is therefore not realized by customers.
Moving toward self-service and the Delivery group as a maintainer of the tools, instead of an executor of the process, and allowing Stage teams to deploy independently will allow them to own their features across the entire feature development lifecycle, increasing their efficiency and removing bottlenecks. This also supports our department goal of Preparing self-servicing for stage group teams to enable end-to-end development. In order to get there, the first steps we have to take are:
Delivery's MTTP PI has tracked our work to make deployments to GitLab.com fast and reliable but it is limited to an overview of whether things are going well or not so well. It doesn't give us the level of insight we need to make adjustments to improve our processes and tools. In order for us to effectively and efficiently improve the way we do things, we need a more granular level of detail and instrumentation. This will help us to increase the efficiency of the deployment and release processes and every feature released will benefit from this.
In FY24 we'll review existing group metrics and create a set that represents all of Delivery group responsibilities to allow us to measure our impact. Additional metrics will include:
In addition we are improving the deployment pipeline observability to increase the insight we get into areas of improvement, reliability and deployment duration. We will:
As we review and increase the number of metrics in use we'll need to focus on how and where we track these metrics to make sure we maintain a usable overview of the Delivery group.
GitLab is growing consistently and the demands of the platform are constantly evolving. As part of the move toward becoming the best in class AI enabled DevSecOps platform, we have a renewed focus on experimentation. We get the best data and insights for our experiments by exposing them to real customer traffic and getting feedback in a production environment. However, this can introduce risk and friction in the deployment process and our approach to experimentation is more manual than we would like. In order to increase the number of concurrent experiments we can run and remove risk, we have to add more flexibility into our deployment options.
This approach will also allow us to trial major platform upgrades (Ruby 2.x - 3.x, Rails 6.x - 7.x, etc.) in a way that is safe and gives us confidence. This strategy could drastically reduce the coordination needed between teams as well as the time taken and risk to make this type of change. It's also an area where our competitors are able to leverage the latest features in a way that we can't yet. In order to continue to win against GitHub, we must reduce the cost of change to our platform.
We’ll start by:
As part of being a complicated subsystem team with a high operational load, we have to be deliberate about the work that we take on. There are a few things that we’re interested in, but can’t take on right now:
Because Delivery are responsible for deploying our multi-tenant SaaS offering (gitlab.com) as well as releasing GitLab packages for Dedicated and Self Managed, we prioritize "Keep the lights on" activities (e.g. deployment failures, incidents, release management) above all else to ensure we provide customers a high level of service that continually meets our reliability and performance SLAs. Aside from this our work assumes the normal product prioritization process and the top priority is just a reflection of our operational responsibilities.