Data science seems to have taken over as the number-one tool sought after by companies eager to increase sales, productivity and general success. But is the craze around data science truly deserved? And what does a data scientist really do, anyway?
If some were quick to deem the position of data scientist the “sexiest job in the 21st century”, the numbers seemed to prove them right. In 2019, job postings for data scientists on Indeed had risen by 256% compared to 2013. A decade after this claim was made, one can start to wonder if this enthusiasm is still relevant today. To answer this question, we have reached out to Professor Louis-David Benyayer, the scientific co-director of ESCP’s MSc in Big Data and Business Analytics.
How can the rise of data science be explained?
Though tools and techniques used in data science are not all new, some algorithms were discovered several decades ago, data science rose significantly over the last decade. One simple reason for this is that much more data is available. Think about the data produced by internet users when they use social networks, search engines or digital services such as Uber or Netflix. None of this existed twenty years ago. It is estimated that around 4 million queries are typed on the Google search engine every minute.
In addition, with digitalisation, a lot of processes moved from paper to digital, which increased the volume of data. When I started working after graduating from ESCP in 2000, I had to send printed resumés to companies. Now, companies receive digital versions of résumés they can analyse with the support of software.
The other massive source of data is the data produced by connected devices and machines. Sensors capture and store data in real-time in smartphones, cars, planes, washing machines and many other products. When an Airbus plane lands, 1TB of data is transferred to a cloud server. With that volume of data increasing and being available to many different types of organisations, the need for people and processes to analyse them became clear, hence the rise of data science.
A lot of businesses which previously had not invested much in digital connection with their clients or prospects started doing so and analysed that new trove of data to make better pricing, couponing, service or offer decisions.
What type of businesses is more prone to use data science today?
With digitalisation, nearly every business has the opportunity to leverage data to make better decisions. Websites, social networks, search engines, and forums, are full of data provided by internet users on what they like, what they dislike, and what they think or expect. This data is useful to define product offerings and marketing strategies.
With Covid, a lot of businesses that previously had not invested much in digital connection with their clients or prospects started doing so and analysed that new trove of data to make better pricing, couponing, service or offer decisions. It’s been the case for small retail shops and restaurants for example.
With connected devices, a lot of manufacturing companies are now leveraging the data produced by these devices to develop better products but also to offer services in addition to their product offering, for example, financing, subscription or usage-based offers. The more data is produced the more value is accessible. Consequently, the more a company or industry has access to vast amounts of data, the more it can benefit from data science.
What is it a data scientist exactly does?
Data scientists analyse large datasets to provide actionable insights for business decisions. For example, a data scientist would analyse the millions of interactions of customers with a brand on social media to identify patterns, cluster clients into groups, predict their likelihood to adopt a new offer and identify specific marketing actions to convert them. They give answers to questions such as “who should I target with which value proposition to increase the revenue by X%”. It implies a good computer science background to be able to manipulate a vast amount of data and solid training in statistics and math to build and test a model. Domain expertise, knowledge of a specific field, is also valuable to be able to spot meaningful trends and patterns among the large and messy information available.
We tend to think of data science as purely objective, but it’s not. It’s driven by human decisions. As such, the flaws or weaknesses of data science are related to human flaws or weaknesses during the process.
So, the question is: do all companies need a data scientist?
As we described, data scientists are useful in the context of large data sets, because those can’t be analysed with traditional software. So, companies don’t need data scientists if they don’t have a big data set. Being caricatural we could say that if the data can be analysed in Excel then the company probably doesn’t need a data scientist.
Then, to reap the benefits of data science is it sufficient to bring in a group of data scientists?
Well, the answer lies at the two extreme points of the data value chain: when data is captured and cleaned and when it’s used for decisions. If a company has talented data scientists but does not have enough quality data to provide them with, their talent is wasted. That’s why it’s necessary to invest significantly in IT and processes to capture, clean, store and make available quality data.
Then, at the other extreme of the process, if a company has very good quality data and talented data scientists but can’t act on the results of the analysis, the value is not captured.
This happened with a company in retail banking. They had built a very good model to predict the likelihood of their current clients leaving for their competitors in the coming months. So, they gave the list to the branches and asked them to reach out to these clients. The branches answered that they already had the list and that the problem was not to know who will leave but to have the right offer to keep them. Prediction, as good as it may be, is of no value if you cannot act on it.
If a company has talented data scientists but does not have enough quality data to provide them with, their talent is wasted. At the other extreme of the process, if a company has very good quality data and talented data scientists but can’t act on the results of the analysis, the value is not captured.
Has data science as we know it today got any flaws or weaknesses?
We tend to think of data science as purely objective, but it’s not. It’s driven by human decisions. As such, the flaws or weaknesses of data science are related to human flaws or weaknesses during the process. For example, choosing to collect or not some data is a human decision, and choosing to use it or not is another human decision. In this domain, the debate is very vivid regarding biases embedded in the data used to train machine learning algorithms. Researchers and activists have challenged some algorithms for favouring certain groups or treating them differently because the data used to develop the model was unbalanced in favour of a group.
What can we predict about the future of data science?
Technical signs of progress are regular and some are impressive. You may have seen during the summer the results of machine learning algorithms such as DALL E or Midjourney. You can now type a sentence and get a totally original picture created by the algorithm which has been trained with pictures available online. However, similar algorithms require a lot of data to be trained, which requires a lot of electricity to store the data and to build and use the model. Electricity consumption and, more globally, the environmental footprint of data science is a clear challenge in the coming years.