Data science projects also fail. However, it is rarely due to a lack of interest on the part of IT decision-makers and managing directors. […]
Data science is currently the most popular method for solving business problems. However, projects with weak points can cause considerable damage to companies.
In fact, data science initiatives that use scientific methods, processes, algorithms and technology systems to gain insights from structured and unstructured data can fail in a variety of ways. This ultimately leads to a waste of money, time and other resources. Read why data science projects end in failure.
Bad data makes for bad data science. It is therefore crucial to invest sufficient time to ensure that the data is of high quality. This applies to any analytical project – as well as when it comes to data science.
“Bad or contaminated data makes data science initiatives impossible,” says Neal Riley, CIO at the consulting firm Adaptavist: “You have to make sure that your data is suitable for analysis. If you are not, it is a waste of time.” If bad data were used for data science projects, it would lead to models that “deliver strange results and ignore reality,” says Riley.
Data quality may also suffer due to distortions or discrepancies in the datasets. “In some companies, there are several systems that are used to operate the company,” says Brandon Jones, CIO at the insurer WAEPA. “For long-established companies, there may even be legacy systems that are still being accessed for reference or validation reasons. In many cases, the business has changed with each system, resulting in different processes and counting types of metrics within the company.“
This could be one of the main causes of the failure of data science, says the CIO. The results could be inflated due to double counting based on a modified business process. “To solve this problem, companies need to bring their data analytics program to a certain level. This means that a specific date must be set when the data will be validated and all participants will commit to the common standard.“
How can a data science initiative be successful if the team does not understand the problem it is supposed to solve? Nevertheless, data science teams are confronted with this in projects: “The definition of a problem is often left to data scientists, although this actually includes business cases that define both the scope of the work and the potential return on investment,” explains Michael Roytman, chief data scientist at the cybersecurity company Kenna Security.
Business users who want to use data science have to ask probing questions about the problem they are trying to solve, says Marc Johnson, senior advisor at the consulting firm Impact Advisors: “As with any project, you should take the time to narrow down the scope of the problem in order to identify the right sources for the data.” The consultant tells about a project that dragged on for two years without a clear direction, “because the problem we were trying to solve was vaguely defined”.
Another way to data science fail is to not provide the right data that is needed to solve a particular problem. It doesn’t help to throw an enormous amount of data at a problem.
“In many places, there is an assumption that large amounts of data lead to insights, which is actually rarely the case,” says Roytman. “Intelligent, tailor-made and often smaller datasets are much more likely to deliver robust and reusable models.“
In order to benefit from data science, data should ideally only come from relevant sources, Johnson recommends. If data had to be collected or purchased from different sources, the teams should ensure that changes to the data do not distort the results and affect the quality of the entire data set. They would also have to ensure that there are no data protection, legal or ethical problems with the data set.
Teams need to be transparent about the data they used to create a particular model. “Data science projects fail when the model is untrustworthy or the solution is incomprehensible,” says Jack McCarthy, CIO of the Justice Department of the US state of New Jersey: “To prevent this, you need to be able to give stakeholders who may not have the technical or statistical knowledge a picture of it.“
Data scientists would have to explain where the data came from, what they contributed to the calculation of models and also provide access to all relevant data: “Transparency can be the key to a successful project,” says the CIO.
Sometimes the department that demands insights, or even the data science team itself, is simply not willing to consider results as uncertain, unclear or not meaningful enough for a business application. “It’s an equally acceptable and valuable response to say, ‘The model is not good enough to generate an ROI for the company,'” Roytman says.
The data science team at Kenna Security spent two months developing a model for classifying vulnerabilities, Roytman says. “The model worked and was a solid answer to a problem. But it didn’t work well enough to be valuable to our customers. The accuracy left much to be desired. So we stopped the project, although we had invested time and achieved a result.“
Data science efforts need a champion in the C-suite so that the projects receive sufficient resources and support.
“It helps if it’s the CIO,” Riley says. “Even if CIOs are not the internal champions for data science, they should be responsible for the security of all data involved. But the commitment should continue: I would see the task of a modern CIO to get the most out of the information collected. All this data can be used intelligently to learn. This allows CIOs to support their organizations across functions.“
A skill gap plagues many aspects of IT, and data science is no exception. Many companies do not have the appropriate specialists to maintain projects in the field of data science or to get the maximum benefit from them. “Real data scientists are in high demand, hard to come by and expensive,” says Tracy Huitika, CIO of Engineering and Data at automation provider Beanworks. “The position usually requires a PhD in physics or science, as well as the ability to write code in R and Python.“
According to Johnson, one of the main reasons for the failure of data science projects is the lack of operational talent. “Using a brilliant data scientist to create the model without having a plan for operating continuous improvement with adjustments to market and data changes is like designing a car and giving the keys to a ten-year-old.“
Companies would have to acquire the right skills to maintain the model after it has gone into production, either by hiring specialists or by using external experts, the consultant says.
It should be well considered whether and if so which data science methods, processes and tools are used to ensure that the solution fits the problem. “Maybe you don’t need machine learning at all, just a simple regression model,” Riley notes.
This article is based on an article from our US sister publication CIO.com.
*Bob Violino works as a freelance IT journalist for InfoWorld and Network World in the USA.