Keep the Clock Running on Your Data Science Project

摘要： How quick wins buy time for breakthroughs

images/20220601_4_1.jpg

Time is running out on your data science project, and you haven’t made a breakthrough yet. Predictive modeling is fraught with uncertainty, but the business doesn’t understand that, or worse yet, doesn’t care. You need to deliver something, and fast. What options do you have?

They Don’t Call It Data Science for Nothing

Data science work is experimental. Many projects that look feasible during planning will ultimately fail. Maybe the input data is less predictive than anticipated. Or the target variable is inconsistent and poorly defined. Or you can’t get access to the data you hoped. Projects can fail for reasons beyond your control, due to data issues that no one would have anticipated.

But other projects fail because you run out of time before making that game-changing breakthrough.

How Not to Chase Breakthroughs

You fear that your model will never see the light of day if it isn’t good enough. This makes chasing breakthroughs with complex models tempting.

I fell for this trap recently. My team was trying to predict when insurance claims would close based on a stream of notes and loss information. Attempts to use simple representations of text along with the most recent snapshot of loss information failed to produce an acceptable model. So I tried featurizing the text with BERT vectors and passing the full time series of loss information through a recurrent neural network. It was a dead end. I have no regrets about it, but it burnt through over a month of development time.

Incremental Value

Delivering incremental value is a core principle of agile software development. Now, I’ve heard a million arguments against using agile for data science teams. I’ll save my counter arguments for another day, but this aspect of incremental value is the key to solving our little breakthrough-deadline dilemma.

I will share a number of quick wins you can deliver to the business that can buy you valuable time until that breakthrough comes.

Quick Win Scenarios

Deploy a Benchmark Model

The first version of your model does not need to be perfect. A naive or benchmark model is often more valuable to users than nothing. Here are some examples of potentially useful benchmark models you can ship.

Mean or median model — For regression tasks, a naive model that predicts the mean or median all the time may be better than nothing. You could even soup it up by predicting the mean within groups.

Past equals future — For forecasting tasks, consider shipping a simple model that predicts that the past will exactly equal the future. This benchmark can be hard to beat.

Most popular — For recommendation tasks, it’s hard to do better than predicting the most popular items to all users. Consider shipping this benchmark first.

Get creative — Can you think of a benchmark model for your task that might partially solve your customer’s problem?

Of course, it’s important to set expectations with the business if you decide to ship a benchmark model. It’s also important to verify that low customer trust in the first version of the model won’t have long lasting impacts.

Solve A Sub-Problem

The problem you’ve been tasked with ultimately solving may be very high-level or broad in scope. In this case, it’s possible to solve and deliver components of the problem incrementally.

Let’s return to the insurance claim problem mentioned earlier. Predicting when a claim will close is actually quite broad in scope. There are many different types of claims and many different reasons each would stay open longer than expected. For example, a claim where an attorney gets involved will certainly take longer than one without attorney involvement. So if we can limit the scope of the problem to predicting risk of attorney involvement, the modeling effort becomes much more tractable and still provides value to customers.

Do Some Engineering

At the end of the day, you have to show that you and your team are delivering value to customers. If the quick wins above aren’t an option for you, consider demonstrating how much of a team player you are by contributing code or data of any kind to production. The line between data science and software or data engineering can be murky even in the best of times. Use this to your advantage if you think it can buy you enough goodwill and space to finally make that breakthrough.

This might mean picking up some tickets related to deploying models or creating data pipelines. Or it might mean a larger commitment to own business logic that everyone knows should belong to engineering but they don’t have time to build (I’ve seen this happen at two different companies, so I assume its common).

Chase That Breakthrough, Wisely

Now you’ve delivered some quick wins to the product and bought yourself another few months. Go chase that breakthrough, and spend your hard-earned time wisely.

轉貼自Source： towardsdatascience.com

若喜歡本文，請關注我們的臉書 Please Like our Facebook Page：　　　Big Data In Finance

Keep the Clock Running on Your Data Science Project

摘要： How quick wins buy time for breakthroughs

They Don’t Call It Data Science for Nothing

How Not to Chase Breakthroughs

Incremental Value

Quick Win Scenarios

Deploy a Benchmark Model

Solve A Sub-Problem

Do Some Engineering

Chase That Breakthrough, Wisely

留下你的回應

以訪客張貼回應

回應

釘選列表

喜愛列表

Web Services

YOU MAY BE INTERESTED

Popular Tags

	今日	1193
	昨日	1503
	本週	11764
	本月	32076
	總訪客量	2778110