Data science seems to have taken over as the number-one tool sought after by companies eager to increase sales, productivity and general success. But is the craze around data science truly deserved? And what does a data scientist really do, anyway?
If some were quick to deem the position of data scientist the “sexiest job in the 21st century”, the numbers seemed to prove them right. In 2019, job postings for data scientists on Indeed had risen by 256% compared to 2013. A decade after this claim was made, one can start to wonder if this enthusiasm is still relevant today. To answer this question, we have reached out to Professor Louis-David Benyayer, the scientific co-director of ESCP’s MSc in Big Data and Business Analytics.
How can the rise of data science be explained?
Though tools and techniques used in data science are not all new, some algorithms were discovered several decades ago, data science rose significantly over the last decade. One simple reason for this is that much more data is available. Think about the data produced by internet users when they use social networks, search engines or digital services such as Uber or Netflix. None of this existed twenty years ago. It is estimated that around 4 million queries are typed on the Google search engine every minute.
In addition, with digitalisation, a lot of processes moved from paper to digital, which increased the volume of data. When I started working after graduating from ESCP in 2000, I had to send printed resumés to companies. Now, companies receive digital versions of résumés they can analyse with the support of software.
The other massive source of data is the data produced by connected devices and machines. Sensors capture and store data in real-time in smartphones, cars, planes, washing machines and many other products. When an Airbus plane lands, 1TB of data is transferred to a cloud server. With that volume of data increasing and being available to many different types of organisations, the need for people and processes to analyse them became clear, hence the rise of data science.
A lot of businesses which previously had not invested much in digital connection with their clients or prospects started doing so and analysed that new trove of data to make better pricing, couponing, service or offer decisions.
What type of businesses is more prone to use data science today?
With digitalisation, nearly every business has the opportunity to leverage data to make better decisions. Websites, social networks, search engines, and forums, are full of data provided by internet users on what they like, what they dislike, and what they think or expect. This data is useful to define product offerings and marketing strategies.
With Covid, a lot of businesses that previously had not invested much in digital connection with their clients or prospects started doing so and analysed that new trove of data to make better pricing, couponing, service or offer decisions. It’s been the case for small retail shops and restaurants for example.
With connected devices, a lot of manufacturing companies are now leveraging the data produced by these devices to develop better products but also to offer services in addition to their product offering, for example, financing, subscription or usage-based offers. The more data is produced the more value is accessible. Consequently, the more a company or industry has access to vast amounts of data, the more it can benefit from data science.
What is it a data scientist exactly does?
Data scientists analyse large datasets to provide actionable insights for business decisions. For example, a data scientist would analyse the millions of interactions of customers with a brand on social media to identify patterns, cluster clients into groups, predict their likelihood to adopt a new offer and identify specific marketing actions to convert them. They give answers to questions such as “who should I target with which value proposition to increase the revenue by X%”. It implies a good computer science background to be able to manipulate a vast amount of data and solid training in statistics and math to build and test a model. Domain expertise, knowledge of a specific field, is also valuable to be able to spot meaningful trends and patterns among the large and messy information available.
We tend to think of data science as purely objective, but it’s not. It’s driven by human decisions. As such, the flaws or weaknesses of data science are related to human flaws or weaknesses during the process.
So, the question is: do all companies need a data scientist?
As we described, data scientists are useful in the context of large data sets, because those can’t be analysed with traditional software. So, companies don’t need data scientists if they don’t have a big data set. Being caricatural we could say that if the data can be analysed in Excel then the company probably doesn’t need a data scientist.
Then, to reap the benefits of data science is it sufficient to bring in a group of data scientists?
Well, the answer lies at the two extreme points of the data value chain: when data is captured and cleaned and when it’s used for decisions. If a company has talented data scientists but does not have enough quality data to provide them with, their talent is wasted. That’s why it’s necessary to invest significantly in IT and processes to capture, clean, store and make available quality data.
Then, at the other extreme of the process, if a company has very good quality data and talented data scientists but can’t act on the results of the analysis, the value is not captured.
This happened with a company in retail banking. They had built a very good model to predict the likelihood of their current clients leaving for their competitors in the coming months. So, they gave the list to the branches and asked them to reach out to these clients. The branches answered that they already had the list and that the problem was not to know who will leave but to have the right offer to keep them. Prediction, as good as it may be, is of no value if you cannot act on it.
If a company has talented data scientists but does not have enough quality data to provide them with, their talent is wasted. At the other extreme of the process, if a company has very good quality data and talented data scientists but can’t act on the results of the analysis, the value is not captured.
Has data science as we know it today got any flaws or weaknesses?
We tend to think of data science as purely objective, but it’s not. It’s driven by human decisions. As such, the flaws or weaknesses of data science are related to human flaws or weaknesses during the process. For example, choosing to collect or not some data is a human decision, and choosing to use it or not is another human decision. In this domain, the debate is very vivid regarding biases embedded in the data used to train machine learning algorithms. Researchers and activists have challenged some algorithms for favouring certain groups or treating them differently because the data used to develop the model was unbalanced in favour of a group.
What can we predict about the future of data science?
Technical signs of progress are regular and some are impressive. You may have seen during the summer the results of machine learning algorithms such as DALL E or Midjourney. You can now type a sentence and get a totally original picture created by the algorithm which has been trained with pictures available online. However, similar algorithms require a lot of data to be trained, which requires a lot of electricity to store the data and to build and use the model. Electricity consumption and, more globally, the environmental footprint of data science is a clear challenge in the coming years.
License and Republishing
The Choice - Republishing rules
We publish under a Creative Commons license with the following characteristics Attribution/Sharealike.
- You may not make any changes to the articles published on our site, except for dates, locations (according to the news, if necessary), and your editorial policy. The content must be reproduced and represented by the licensee as published by The Choice, without any cuts, additions, insertions, reductions, alterations or any other modifications.If changes are planned in the text, they must be made in agreement with the author before publication.
- Please make sure to cite the authors of the articles, ideally at the beginning of your republication.
- It is mandatory to cite The Choice and include a link to its homepage or the URL of thearticle. Insertion of The Choice’s logo is highly recommended.
- The sale of our articles in a separate way, in their entirety or in extracts, is not allowed , but you can publish them on pages including advertisements.
- Please request permission before republishing any of the images or pictures contained in our articles. Some of them are not available for republishing without authorization and payment. Please check the terms available in the image caption. However, it is possible to remove images or pictures used by The Choice or replace them with your own.
- Systematic and/or complete republication of the articles and content available on The Choice is prohibited.
- Republishing The Choice articles on a site whose access is entirely available by payment or by subscription is prohibited.
- For websites where access to digital content is restricted by a paywall, republication of The Choice articles, in their entirety, must be on the open access portion of those sites.
- The Choice reserves the right to enter into separate written agreements for the republication of its articles, under the non-exclusive Creative Commons licenses and with the permission of the authors. Please contact The Choice if you are interested at contact@the-choice.org.
Individual cases
Extracts: It is recommended that after republishing the first few lines or a paragraph of an article, you indicate "The entire article is available on ESCP’s media, The Choice" with a link to the article.
Citations: Citations of articles written by authors from The Choice should include a link to the URL of the authors’ article.
Translations: Translations may be considered modifications under The Choice's Creative Commons license, therefore these are not permitted without the approval of the article's author.
Modifications: Modifications are not permitted under the Creative Commons license of The Choice. However, authors may be contacted for authorization, prior to any publication, where a modification is planned. Without express consent, The Choice is not bound by any changes made to its content when republished.
Authorized connections / copyright assignment forms: Their use is not necessary as long as the republishing rules of this article are respected.
Print: The Choice articles can be republished according to the rules mentioned above, without the need to include the view counter and links in a printed version.
If you choose this option, please send an image of the republished article to The Choice team so that the author can review it.
Podcasts and videos: Videos and podcasts whose copyrights belong to The Choice are also under a Creative Commons license. Therefore, the same republishing rules apply to them.