Product Stage Direction - ModelOps

The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.

Stage Overview

GitLab ModelOps aims empower GitLab customers to build and integrate data science workloads within GitLab. This stage also supports our FY24 Product Theme: GitLab for Data Science and the GitLab company vision of developing an AllOps platform.

Our ModelOps stage aims to accomplish the following:

Bring the best practices of DevSecOps to data sciences by providing improved Python Notebook experience across GitLab, support for more powerful compute within GitLab runner, and simplified CI configuration for popular data science toolchains.
Reduce complexity and streamlining systems by providing native integrations to popular data science toolchains and open-source frameworks, first-party solutions for DataOps and MLOps workloads, and open APIs for increased flexibility.
Enabling seamless handoffs across the SDLC through model registry and integrations that better support data science workloads across existing GitLab features.

The ModelOps Stage is currently outside of the GitLab DevOps lifecycle. We believe that data science features can span across all DevOps stages, making existing features more intelligent and automated.

Stage Direction Walkthrough & Commit Talk

Watch VP of Product David DeSanto, Engineering Manager Monmayuri Ray, and Principal Product Manager Taylor McCaslin discuss an overview of the GitLab ModelOps stage. They discuss the three pillars of ModelOps, including how to integrate Data Science into DevOps. This includes a brief history on how we got here, as well as where we are going. It discusses GitLab’s recent acquisition of UnReview and how GitLab plans to leverage ML/AI within our platform to improve user experience, as well as empower users to include ML/AI within their applications.

One of our primary goals for our ModelOps stage is to reduce the complexities of data science workloads and integrate these workloads to easily be managed and developed within GitLab.

Lack of Repeatability

Data scientists do not have the experience of DevOps engineers (and vice-versa). Their skills are not focused on building robust and production-ready systems. Much of data science work is experimentation, cobbling together whatever is needed to identify and produce value. Throughout this experimentation, lots of data, packages, tools, and code get written on a data scientist's machine. This creates a bespoke environment that is hard to reproduce, adds friction to handoffs, and diverges from production systems.

It works on my machine

We want to help data scientists create repeatable environments with source code management and CI/CD at the heart of them. It should be easy for anyone on the team to explore the latest model experiment and iterate on it.

Difficult handoffs

Because of the challenges with complex toolchains and lack of repeatable environments, handoffs can be a challenge with data science teams. These teams may produce amazingly valuable models and insights for an organization but when it comes time to deploy those models to production, it can take months. We want to help different teams across the software development lifecycle (SDLC) to better collaborate and handoff data, code, and models. We want to do that with the toolchain software engineering teams are already using.

Smoother Handoffs

All together, these challenges lead data science teams to use specialized tools that don't integrate with each other or the existing software development lifecycle tools organizations already use. It leads teams to work in silos creating handoff friction and finger-pointing as well as guesswork and lack of predictability. Applications end up not leveraging data well and models take months to get into production and security is an afterthought. This creates risk for organizations, slows innovation, increases complexity, and increases the time to value. All of this could be avoided with an integrated DevOps platform that natively supports data science workloads. That's exactly what we are building.

ModelOps of Tomorrow

We are taking best practices from DevOps and applying them to data science workloads: From the processing of data workloads with Dataops to the productionization of data science models. Teams streamline handoffs because they are working in the same platform based on source code management with CI/CD and integrated security testing. Organizations can reduce risks associated with ML/AI, speed up innovation, reduce complexity, and reduce time to value.

Groups

There are two areas of relevance to GitLab ModelOps which we believe are critical to having end to end functioning data science workloads on GitLab:

MLOps - Enabling customer data science use cases which include accessing and interacting with data, AI/ML toolchain integrations, and compute environment integrations.
DataOps - Enabling data processing use cases like building, running & orchestrating Extract, Load, Transform (ELT) data pipelines to shape and process data for useful analysis.

Team and Investment

GitLab ModelOps is currently composed of two groups. Data Science use cases are one of the four product investment themes for 2023 which align with GitLab's vision of fully integrating ModelOps within our DevOps platform.

These groups will be expanding with more roles opening throughout 2023:

MLOps - An experimental single-engineer group focused on supporting AI/ML workloads within GitLab
- Future openings include: Data Scientist, QA, and UX Design, Engineers.
DataOps - currently unstaffed, but a planned future product group focused on supporting data workloads that power modern software applications built with GitLab.
- Future openings include: Data Scientist, QA, and UX Design, Engineers.

Accomplishments, News, and Updates

Section & team member updates

2022 the ModelOps stage was actively staffing up and laying a foundation to build ML/AI powered features into GitLab. We established our ModelOps teams, introduced a new language to the GitLab stack (Python), and developed engineering processes to effectively enable GitLab to pursue the wider Data Science vision.

In 2023, we'll continue to grow teams in our ModelOps stage, expanding the use cases and features we are developing into the GitLab platform.

Internal team members can watch/read our latest updates from our latest ModelOps Group Conversation from October ( slides, video )

Important PI milestones

We've established a ModelOps internal handbook PI page (internal link) which will be updated monthly as part of PI review meetings. We're still working to actively orchestrate all our performance indicator metrics.

Recent accomplishments

Experimental Experiment Tracking
GPU Enabled SaaS Runners
15.4 - Suggested Reviewers Beta - GitLab's first production ML powered feature
15.4 - More powerful Linux machine types for GitLab SaaS runners
14.5 - Cleaner Python Notebook Diffs - Video demo - Docs
13.9 - Exposing GPU to GitLab Runner - Video Demo - Docs
Internal Feature - Automatic Issue labeling model that is based on GitLab's internal issue tracking and label usage. This was our first small feature experiment to see if data science workloads could bring benefits to existing GitLab features. You can explore this working prototype in the Slack channel: #feed_tanuki-stan

What's Ahead

Potential idea list

1 Year Plan

What We Are Currently Working On

The ModelOps team is actively working to integrate machine learning into GitLab and the following outlines where we are currently investing our efforts:

Model Registry - support in the GitLab Package registry to host ML/AI model versions
Exploring potential partners in this space to quickly integrate with vendors who have existing GitLab customers. Propose a partner or integration here.
Customer interviews as we explore the space to understand both customer personas and areas of potential product opportunities.
Dogfooding We are working across GitLab to leverage existing GitLab features through the lens of data science to use our own features and reimagine how they could better support data science use cases in the future. Learn more about Dogfooding at GitLab.

What We're Not Doing

The following will NOT be a focus over the next 12 months:

Advanced Computing - While we understand that AI models can require advanced computing resources we are not currently working on next-generation compute resources for GitLab Runner like TPUs and Deep Learning hardware. Customers with these use cases can expose compute leveraging self-hosted runners and executors or leverage existing APIs from advanced computing vendors directly.
Non-cloud native data stores - Data can live in many places. However, if it's not in the cloud accessible with modern APIs it will not be a focus for GitLab in the next year. We are focused on cloud-native data stores.

Last Reviewed: 2023-08-17
Last Updated: 2023-08-17