Data science insights

Reading post: data science project plan

Here's the process we follow when working on a data science project. You'll learn what steps we follow and what process model we base it on.

First, you'll hear a lot said about big data in the same breath as data science. So you may be thinking that you don't have enough data to justify a data science project.

We'll explode an unhelpful myth right now.

Data science is not only about big data. It's about any data set that may have useful patterns. Learning those patterns leads to actionable insight. And that's what data science is all about. Pattern can and do exist in data sets of all sizes. 

With that in mind, let's move on to the plan...

CRISP-DM

CRISP-DM stands for Cross-Industry Standard Process for Data Mining.

It's a long-winded name for an excellent, phased plan. Let's take a look. 

The model has six phases:

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modelling
  5. Evaluation
  6. Deployment

These are the phases we follow when working with our clients.

Business & data understanding

The first two stages is where our data scientists try to define the goals of the project. 

They'll do this assessing the business needs and the data available to the business. 

The iterative process starts early. That's because data scientists will switch between the business and the data available. 

This phase is where the business question is set and a deep dive into the available data begins. 

That will identify if the data needed to answer the business question is available. And if not, will it be possible to extract the required data?

Data preparation

The next phase is data preparation. The purpose here is to make sure there is a data set available for analysis. 

It might be necessary to merge data from many sources. And the data may need cleaning to ensure that analysis will be successful.

Modelling

This is the technical part. It's where algorithms look for useful patterns in the data. 

It is where machine learning (ML) joins in. Our data scientists will use algorithms to train different models on the given data set. 

Patterns that are useful get returned from the models. And these help to achieve the project goals. 

Running the models again with new or updated data is also possible. That means decisions and predictive modelling is up to date.

Evaluation and deployment

These stages focus on how the model fit the business and its processes. 

Tests run during the modelling stage focus on the accuracy of the models for the data set. 

The evaluation phase is about making sure the model meets the business objectives. 

You know the model(s) work at this stage. But now it's time to make sure they are delivering the output needed. 

Deployment into the working processes of the business should be seamless. If the other stages have resolved then the ML output will be answering the business question(s). 

Next steps

We will continue to work with you to check that our solution is working for you. 

And we will suggest refinements if it looks like any will help. 

So, all you have to do now is get in touch...