Is Historical Data Essential for Data Science?
Historical data is crucial for building a successful data science solution. The availability of historical data provides the necessary foundation for data analytics. Without historical data, data science is impossible. But why do we call it “historical” data? Data points, in essence, represent events in the past. However, real-time data is also considered historical since the event has taken place. Business teams usually conduct a cost-benefit analysis to ensure that the analytics solution has an acceptable ROI (Return on Investment).
But what if there is no metric available to quantify the extent of the business problem? For instance, there may be a sense that there are sporadic customer complaints regarding how troubleshooting tickets related to a particular home appliance are handled. In such cases, a data scientist might propose to implement a ticket-closure-time prediction ML model based on various features such as customer demographics, the ticket, device usage, the customer’s financial status, and competition in the market. The predicted time to resolution can then be communicated to the customer at regular intervals until the ticket is resolved, aimed at reducing customer dissatisfaction.
However, even if we have all the data necessary for training the model, such as the features and the target variable (time to resolution), the extent of the current dissatisfaction level or the extent to which the current dissatisfaction level will decrease after using the data science solution may not be known. Nonetheless, the belief is that the dissatisfaction level will decline.
The success of a data science project depends on the ability to convert the business problem statement into an analytics problem statement, which can also be called an “analytics solution statement”. This is the essential prerequisite for entering into a data science project. There is no overarching analytics problem statement. When one attempts to define one, it becomes a solution statement instead of a problem statement.
For example, a business problem statement could be: “The percentage of customers complaining about how troubleshooting tickets are handled is increasing”. The analytics solution statement could be: “Predict and communicate to the customer regularly the time remaining for resolving his/her ticket”. It’s important to note that one business problem can have more than one analytics solution. In such cases, the data science team should work with the business team to determine the best analytics solution to move forward with.
Moreover, it’s essential to consider the expertise and skills required to execute the data science project. This includes machine learning algorithms, statistics, domain experience, presentation, data engineering, data governance, MLOps, and programming. Additionally, it is important to have the necessary software and hardware in line with the requirements of the project.
Understanding the end consumer of the solution is also crucial. Will it be used by the sales team, marketing, ordering, finance, logistics, HR, IT, or the service assurance team? Knowing the solution’s end consumer will help in determining how the solution might be consumed, whether as a flat file report, full-fledged dashboard in an application, or in any other form.
Furthermore, estimating the adoption of the solution by business users and the rate at which adoption is expected to increase over the months following the implementation is important. Failure to consider the lack of adoption can lead to inaccurate quantified benefits. It is important to note that 100% adoption right from the day of implementation is unrealistic.
Ultimately, it is crucial to know who is funding the solution. The sponsor is expected to be serious about the quality of the solution, as all teams collaborating to solve the business problem are answerable to the sponsor at the end of the day.
In conclusion, the availability of historical data is essential for data science. The success of a data science project depends on the ability to convert the business problem statement into an analytics solution statement. Additionally, it is crucial to consider the expertise and skills required to execute the project, the software and hardware available, the end consumer of the solution, the estimated adoption of the solution, and the source of funding. Considering these factors will contribute to the success of the data science project.