Why Data Science in Cybersecurity is the New Sexiest Job of the Century

The Rise and Fall of Data Science

Tobias Faiss
6 min readNov 29, 2022

--

We all know the famous Harvard Business Review article about why being a Data Scientist is the sexiest job of the 21st century.

The subsequent rise of Data Analytics, Machine Learning and AI led to a excessive hiring spree:
Everybody wants to become wrangle data and get great insights of an organization’s data so they can be the next Google or Amazon.

Now, after 10 years we are facing the harsh reality: The entire Big Tech industry is going through a vale of tears and most other companies have recognized that Machine Learning and AI is not bringing them to the promised land.

So, it’s time to review the original statement: Is Data Science still the sexiest job in this century or is it still sexy at all?

What has happened?

In the last decade we have seen tremendous growth in various areas:

  • Communicating and collaborating over the internet became normal
  • Teaching, learning and educating on demand on a world-class niveau became possible and popular (just look at Coursera for example).

Big Tech started to hire excessively data scientists and engineers to fulfill their missions. Considering their working conditions, salaries and perks, being in a data role seemed like winning the lottery. Almost every graduate wanted to have that life. And not just graduates: Also a lot of professionals from other domains started leveraging on the online education offers to become the next Data Scientist.

On the other side, many organizations started to build up their data teams and started working on their first projects. After some time it turns out, that the fundamental work and preparation wasn’t done. Most companies had no data strategy and have a lack of data literacy in general. As a result, the available information in an organization was anything but usable and since most data teams are focusing on statistics and model building, they certainly didn’t have any experience in data engineering or data preparation. In addition, the rise of AutoML led to a democratization of AI and ML model building (which is a fancy term for minimizing manual labor).

To sum up, there is on one side a continuous flow of emerging data scientists due to its “sexy” reputation, on the other side data science can’t deliver on its promises and is also challenged by automation and commoditization. Given these developments one might say, being or becoming a data scientist is a dead end in 2022.

But is this really the case? Not at all.

How will you survive as a Data Scientist?

The question is more if today’s data scientists are having the right skills for the right problem in the right environment at hand. And this is where expectation and reality highly diverge.

Let’s look into it step-by-step:

1) The right skills

The success of data science projects are highly relying on the availability and quality of the data provided. While educational programs covered this part only partially or neglected it at all, it is a significant key element to suceed in the field of Data Science.

Source: Hidden Technical Debt in Machine Learning Systems

As you see in the picture, deploying a data science project in the real world entails a lot more than just the data model. The struggle starts already before the modeling process when it comes to source the (right) data and cleanse it in a sophisticated way — this is also the reason why the employability of data engineers is currently way higher than from data scientists. But also after the modeling process the work is not finished. The deployment and operation of any data or machine learning model requires a lot of skill in terms of IT-Operations, specifically DevOps. Even further, these model need to be monitored continuously due to effects like model decay or any unexpected behavior in their predictions.

2) The right problem

The second challenge comes from addressing the right problems. A native Data Scientist might have no relevant experience in a specific business domain. Therefore it is difficult to assess the relevance of the available data and what they implicitly mean. Domain expertise is particularly in the first phase of a data project, namely data sourcing and cleansing, important. The more meaningful and reliable data you have, the more value you can eventually create. One might bring up the objection that utilizing statistical metrics you can substitute a lack of domain expertise. But reality shows there is no substitute for a solid foundational knowledge of your domain or industry.

3) The right environment

Being aware of the environment you are in is from an execution point of view the most relevant dimension to reflect on.

Domain Expertise is more important than ever

Universal data scientists with no specific domain knowledge won’t survive the next decade. This is because of the ongoing democratization of Data Science abilities and hence it becomes interchangeable.

Average Data Scientists are solving problems.
Extraordinary Data Scientists figure out which problems to solve.

So, here we are. If you want to become unique (and hence can charge a premium) you need to find your niche where you stand above the average. One promising domain is for sure the area of cybersecurity. There is almost no day passing without any news of a major security or data breach in the corporate world. The incidents are causing severe reputational and financial damage. This is why these organizations are paying a lot to mitigate these breaches and are also investing in proactive cybersecurity measures.

As a data scientists this is a great opportunity to apply your analytical skills in the cyber threat intelligence niche. Or you can start detecting anomalies in your network traffic. There are a lot of opportunities to build a great career in that space. And more important: The demand is so high that the shortage of talents and experts will remain for the next years.

So, where to start?

How to get started?

You can first of all start by relying on existing datasets. With these you can start exploring and figuring out what might be interesting.

Datasets to start with

As soon as you have created some ideas you can go ahead and start building your own projects (there’s nothing more valuable than doing your own projects).

Here are some ideas, separated by use case:

Spam detection

  • Logistic Regression
  • Naive Bayes
  • Decision Trees/Random Forest

Malicious URL detection

  • Logistic Regression
  • Random Forest

Malware Threat Detection

  • Decision Tree / Random Forest
  • Hidden Markov Models

Anomaly Detection for network traffic

  • k-Means
  • DBSCAN (density based)

Deep Fake Recognition

  • CNN / MesoNet

So, is Data Science in Cybersecurity the new sexiest job of the century? I’m not sure. I personally find it hard to label a job as “sexy”, but being a data scientist in the cybersecurity space is probably one of the most promising jobs in the next years.

About Tobias Faiss

Tobias is a Senior Engineering Manager, focusing on applied Leadership, Analytics and Cyber Resilience. He has a track record of 18+ year in managing software-projects, -services and -teams in the United States, EMEA and Asia-Pacific. He currently leads several multinational teams in Germany, India, Singapore and Vietnam. Also, he is the founder of the delta2 edventures project where its mission is to educate students, IT professionals and executives to build a digital connected, secure and reliable world and provides training for individuals.

Tobias’ latest book is ‘The Art of IT-Management: How to Successfully Lead Your Company Into the Digital Future’. You can also contact him on his personal website tobiasfaiss.com

--

--

Tobias Faiss

Senior Manager | Building a Cyber Resilient World