X
x
Scrabbl
Think beyond ordinary
Subscribe to our newsletter to explore all the corners of worldly happenings

Agile, CRISP-DM and CPMAI Methodologies in AI and ML Projects

Last decade has been witnessing consistent evolution of Artificial Intelligence, Machine Learning and other cognitive technologies. Irrespective of size, industry and target customer, companies are increasingly investing in projects.

Agile, CRISP-DM and CPMAI Methodologies in AI and ML Projects

Last decade has been witnessing consistent evolution of Artificial Intelligence, Machine Learning and other cognitive technologies. Irrespective of size, industry and target customer, companies are increasingly investing in projects based on these emerging technologies for varied reasons. Some businesses are focusing on building smart devices, which are amalgamation of three parallel development tracks of hardware, software, AI / ML models. Some are internal projects wherein the focus is on enterprise predictive analytics, managing fraud, or other tasks aimed at process improvement that serve to provide an additional layer of understanding or mechanism on top of existing data and applications. Various initiatives are based on interactive user interfaces that are spread across a plethora of systems and devices. Some of the recent efforts are oriented towards AI and ML project development goals for public or private sector applications which differ in more substantial ways than these.

 

For more technology insights, follow me @Asamanyakm

 

Though there are differences in the AI and ML projects, their common goal is to apply the exponential technologies and other associated approaches to leverage the evolving capabilities to achieve a wide range of important requirements. You would agree that the existing project management methodologies are either based on software or application development or on enterprise architecture and/or framework or even on hardware development. It is quite challenging to continue working with existing methodologies when faced with unique lifecycle requirements of AI and ML projects. It is very pertinent here to understand that it is not the application or software code, rather Data that forms the basis of AI and ML projects. AI and ML models derive learning from the data which drives these projects. The need of the time is a project management methodology that considers the various data-centric requirements of AI while also keeping track of the application-focused uses of the models as well as other artifacts produced during an AI lifecycle.

 

This article is an attempt to look at various aspects of the existing methodologies to figure out whether a new methodology needs to be crafted or restructuring current approaches to make them AI-compliant will help in execution of AI-ML projects.

 

Analyzing Agile in an AI Perspective

 

Agile methodology is a practice which promotes continuous iteration of development and testing throughout the software development lifecycle of the project. Both development and testing activities work in concurrence unlike in the Waterfall model. The agile software development accentuates on the following core values as opposed to the traditional models:

 

  • Individual as well as team interactions take priority over processes and tools

  • Operational software over comprehensive documentation

  • Customer collaboration over contract arbitration

  • Constructive response to change over strict adherence to a plan

Agile methodologies are widely practiced in the industry for a varied range of application development purposes and supported sufficiently with good reasons. Prior to the pervasive adoption of Agile, many organizations found themselves burdened down by traditional project management methodologies that were more in sync with assembly line methods of production. The traditional approach involved wait period of months or years for a software project to traverse its trajectory through design, development, testing, and deployment. On the contrary, the Agile model was found to focus on tight, short iterations with a goal of rapidly producing a deliverable to meet immediate needs of the business owner, and then continuously iterating as requirements become more granular and refined. Strategists, technocrats, entrepreneurs and industrialists, irrespective of their operational domain of business believe that Agile methodologies have forever changed the way organizations build and release functionality in a fast paced ever-progressive world.

 

At a recent event organized by PMI in which I presented a paper on “Role of Technology in the Progress of a Nation”, the discussions that followed, made me realize that even Agile practitioners are perplexed by the requirements of AI systems. One question that caught my attention was - what is the exact deliverable in an AI project? You can consider the machine learning model as a deliverable, but it’s just an enabler of a deliverable, not providing any functionality in and of itself. In addition, if you dig deeper into machine learning models, what exactly is in the model? The model consists of algorithmic code plus training model data - if supervised, parameter settings, hyperparameter configuration data, and additional support logic as well as code that together comprises the model. Indeed, you can have the same algorithm with different training data and that would generate a different model. Also, you can have a different algorithm with the same set of training data and that would generate a different model as well.

 

So, what is delivered as an outcome of an AI project - the algorithm, the training data, the model that aggregates them, the code that uses the model for an application, all of the above, none of the above? The answer is not definitive. As such, we need to think about and measure additional approaches to augment Agile in ways that make them more AI-compliant.

 

Cross Industry Standard Process for Data Mining (CRISP-DM)

 

Prior to the popularity of Artificial Intelligence and other emerging technologies, organizations that had data-centric requirements, also sought out methodologies that suited their project goals. Evolving from roots in data mining and data analytics, some of these methodologies had at the core an iterative cycle focused on data discovery, ingestion, curation, modeling, evaluation, and delivery. One of the earliest of these methodologies is simply known as Knowledge Discovery in Databases (KDD), which refers to the broad process of finding knowledge in data and emphasizes the "high-level" application of data mining methods. It is of immense interest for researchers in artificial intelligence, machine learning, pattern recognition, knowledge acquisition for expert systems, statistics, databases and data visualization. However, as in waterfall methodologies, KDD is in some ways too rigid or abstract to be used in conjunction with continuously evolving AI/ML models.

 

The necessity for a more iterative approach to data mining and analytics initiatives was clear. To address this, a five vendors consortium developed the Cross Industry Standard Process for Data Mining (CRISP-DM) which focuses on a continuous iteration approach to the various data intensive steps in a data mining project.

 

Specifically, the methodology starts with an iterative loop between business understanding and data understanding, and then a handoff to an iterative loop between data preparation and data modeling, which subsequently gets passed to an evaluation phase, that splits its results to deployment, looping back to the business understanding. The whole approach is developed using a cyclic iterative loop, which leads to continuous data modeling, preparation, and evaluation.

 

However, after the initial production of complete version 1.0 almost twenty years ago, the second version was heard of being on its way around fifteen years ago, which ultimately never made it. Companies like IBM and Microsoft who have hardware as well as software footprints, iterated on the methodologies to produce their own variants that add more detail with respect to more iterative loops between data processing and modeling. Also, these companies played around with more specifics on artifacts and deliverables produced during the process. Even today, these companies are primarily leveraging their methodology modifications in the context of delivering their own premium service engagements or as part of their product line-based implementation processes.

 

Organizations have diverse technology needs irrespective of the industries they belong or cater to. It’s evident, most of them prefer to adopt vendor or platform agnostic approaches for technology implementation as opposed to vendor-centric, proprietary methodologies. The primary challenge to making CRISP-DM work is in the context of existing Agile methodologies. From the perspective of Agile, the entire CRISP-DM loop is encapsulated within the development and deployment spheres, but it also touches upon the business requirements and testing areas of the Agile loop. In fact, considering application-focused Agile development alongside data-focused CRISP-DM for AI/ML projects, both methodologies are entwined in intricate ways.

 

Developing a More Effective AI-Focused Methodology

 

Since project management has simplicity in its roots, organizations need something less complex for handling AI and ML projects. However, complexity increases also since the roles in the organization between the application-focused Agile groups and those in the data-focused methodologies groups are different. In the Agile world, the project manager is at the center, connecting the sides of business and technology development. In the data-centric world, the data organization is at the center, connecting the roles of data scientists, data engineer, business analyst, data analyst, and the line of business.

 

Usually the language of communication varies, with Agile sprints focused on functions and features, whereas Data sprints are focused on data sources, data cleansing, data preparation and data models. Since the Agile and Data approaches are like the two sides of the same coin serving single purpose, we need to integrate these two approaches into a cohesive whole to empower organizations to deliver AI and ML projects with greater reliability, increased flexibility, better customer focus and enriched quality.

 

A synthesized methodology will be help, which initiates from the same root of business requirements and bifurcates into two concurrent iterative loops of Agile project development and Agile-enabled data methodologies. We can think of this as an Agile CRISP-DM or perhaps a CRISP-DM enhanced Agile approach. CRISP-DM is not necessarily the only data methodology we can use here, but it is certainly suitable to a great extent. However, there are some parts of AI/ML project development that are not addressed by either methodology which are listed below:

 

  • Building conversational applications and models

  • Challenges around segregation in model development and iterative de-segregation

  • Hardware-focused model deployment challenges and iterative loops around that

  • Concurrent AI algorithm evaluation and assembling which imposes additional methodology challenges

Of course, there are approaches and methodologies that fill in these gaps with an AI/ML focused approach. Methodologies such as Cognitive Project Management for AI (CPMAI) made specific enhancements to meet AI-specific requirements, especially as they cater to the above requirements, and as they can be implemented in organizations with fully functional Agile teams and data organizations. CPMAI extends the CRISP-DM methodology with AI and ML specific documents, processes, and tasks. The CPMAI methodology also incorporates the latest practices in Agile Methodologies and adds further DataOps activities that aim to make CPMAI data-first, AI-relevant, highly iterative, and focused on the appropriate tasks for operational success. Instituting something completely new and alien is a definite way to face resistance from not only the project management community, but also the business world. Hence, the key is to provide a synthesized approach that delivers the expected results to the organization in parallel and provides a framework for continued iterative development at minimal risk possible. Ultimately, it’s the project’s success that matters most.

 

For more technology insights, follow me @Asamanyakm