The Flow of a Project

Ben Fuqua
5 min readMay 25, 2021

Understanding the Business Problem

Photo by Jon Flobrant on Unsplash

Hello all! If you haven’t read my first article, take a look at it here:

This article is one I’ve thought a lot about and how to present this information because this will probably be my most subjective article. I hope to present it in such a way that it is broad enough to relate to all projects but specific enough to help you outline your project. Again, I won’t be going into any technical details quite yet, but I hope to get into that once I finish the flow of a data science project. In this first article I will cover the first bullet point and in the subsequent articles I will cover the remainder. In these articles I will be covering the basic template of a data science project. The flow of a data science project goes like this:

  • Understanding the business problem
  • Data acquisition
  • Data preparation
  • Exploratory analysis
  • Data modeling
  • Presentation/Visualization

These steps are all very important and should be performed in this order due to the fact they build off of one another. We cannot start picking out the furniture in a house before we have even laid down the foundation. With the business problem being the foundation, we should probably start there.

The first part of Data Science is to understand the business problem. Do you remember when you were a kid and you kept asking your parents the nagging question of “WHY?”, I sure do. Little did my parents know, I had already developed one of the most important traits of a Data Scientist. The reasoning behind the “WHY” is actually an acronym: W.H.Y.

W: Where is this data coming from?

H: How do you hope to leverage the data?

Y: Your expectations and their expectations.

Let’s start with the “W”, where. Knowing where the data comes from can help you understand what kind of analysis you are doing. Is this sales data? Inventory? Marketing? Additionally, knowing where the data is physically located can be important. Is it coming from a database? or .csv file? data that has been scraped from the web? Keep in mind, we haven’t actually seen the data yet, at this point we are just asking questions to better understand the business problem.

Next, “H”, how. Asking the person who is giving you the data “how do you want to leverage the data” can help you understand if there is enough data here to accomplish what they want. Think of this step as a blue print prior to constructing a house. The contractor needs to know if the flooring will be hardwood or carpet, each of these will require different tools and different skillsets. Will the house be made of wood, stone, brick, concrete? Similarly, we need to ask enough questions to help us figure out what kind of “materials” or data we will need to adequately answer the question. Will we need sales data, inventory data, CHANGE in sales or inventory data, will the time these things occurred be an important factor? Thinking about these criteria will help you figure out what you need in your data set PRIOR to you beginning your analysis.

Finally, “Y”. Setting expectations is very important, your client may want you to deliver the world to them, but that is unrealistic and so you have to set the scope of the project with them. This also includes the schedule and resources available to each party. This is a very important step, because there are some people that think Data Scientists are wizards and we can make something out of nothing or we can predict exactly how something will occur. We MUST remind them, that even though we do predictive work and we do sometimes make something out of what APPEARS to be nothing, we aren’t wizards (as cool as that would be). Sometimes we come back and say “Sorry, there is nothing here” or “there just isn’t enough data, we need x more years/months/days worth of observations before we can confidently make a prediction”.

Now for an example, when we sit down with a client for each “insight” or analysis they want us to run we need to ask W.H.Y. Our client asks us “I want you to get insights on why our sales have gone down over the past year.” So we begin to go through the process.

W: Where is this data coming from?

Some example questions could be:

  • How will I have access to this data?
  • Am I only going to have sales data or will other data be available to me?

H: How do you hope to leverage the data?

Some example questions to keep in mind:

  • Why do you think this is happening?
  • Is there anything you want me to look at specifically? Maybe marketing tactics or other factors that are external?
  • Why is this insight important to you?

Y: Your expectations and their expectations.

This one will have more questions because this is where the meat of the conversation happens:

  • When would you like this project to be done?
  • How much will I be getting paid?
  • Will I have a team of people?
  • What do you want to see as the final format? (Power BI report, ML model, or a pdf report)
  • Do you want to be kept up to date on the progress?
  • How can I contact you if I have more questions?

The first two parts aren’t really negotiable because you can’t change where the data is located and you can’t change the question the client has, only specify it if necessary. The expectations are flexible though. To continue with our example, let’s say our client has asked that they want: an ML model, all these descriptive stats and a Power BI report built in one week and you are the only person working on the project. You can tell them this time frame is impossible, and you negotiate other aspects of the project. Salary, who will be working on the project, and final format can all be negotiated as long as you have a good reason.

Understanding the business problem is very important because if you don’t, you don’t know what kind of data you need, you don’t know what format the data should be in, what you are looking for, or if you need a ML model or not.

Please keep in mind though, that if thought you understood what is going on and then you come across a question or a scenario in your exploration that requires you to go back to this step, it is completely normal. Even though the diagram of the flow chart up above is shown to be one continuous flow, it really should be depicted as each box with an arrow pointing to all of the boxes behind it. There is no shame in having to go back a step because you found something new or you came across something that you didn’t consider in your original meeting.

Thank you so much for reading through this article, late this week or early next week I hope to have my next article out on “Data Acquisition”. Hopefully this has been helping, and your path towards understanding data science is more clear.

--

--

Ben Fuqua

Here to help out the new people. Taking it one step at a time.