Deploy a Successful AI Strategy // Part 1: Data
This article is part one of the six part series “Deploy a Successful AI Strategy”
In the first part of the blog series, I deal with the linchpin of every AI project: Data.
Just as every vehicle engine needs gasoline (or electricity) to function, data is essential to AI models. Only with many, high-quality data sets can an AI project become successful, which is why managing data is an essential, if not the most essential part of an AI strategy.
1. Get an overview
As previously explained, the central core element of any AI strategy involves collecting, preparing and deploying data. Data is the fuel of any AI strategy and should therefore also be considered as a strategic asset in a company. However, this is rarely the case nowadays across the business landscape. Especially companies that have only recently digitized their business processes will have to deal with the problem of data organization and data quality. But even many data-driven companies have some open flanks here, because decentralization and “best-of-breed” approaches (often a consequence of the stringent focus on “time to market”) have fostered a divergence of how data is being organized. Certainly, before considering implementing a data strategy, it makes sense to clean up your own backyard first.
2. Organize your data
Data catalogs are a useful tool to evaluate and categorize existing data according to its relevance and quality. With an existing data strategy, it is imperative to reconcile according to MECE principles (Mutually exclusive, collectively exhaustive). An AI strategy should have little to no redundancy to the datasets, but should meaningfully complement and extend it.
Another key quality characteristic of the data catalog is the relative proportion of structured data records.
Typically, companies have far more unstructured than structured records. The challenge with unstructured data records is that they require considerably more cleansing and preparation time. Consequently, the analysis and provision of this data is many times more time-consuming, resource-intensive and consequently more costly. For this reason, it is highly advisable to ensure that data collection is as structured as possible as early as the data collection stage.
3. Evaluate your data
Since prices for computational storage dropped enormously in the last decade and hence the rise of Big Data technologies started, it is now very easy to store data in large masses. Nevertheless, check whether the data sets collected actually have relevance to you, your customers and your organization. Often, useless data is collected that ends up in a data swamp and may entail risks (e.g. compliance with the GDPR in the EU or CCPA in the US). Focus on your customers objectives and KPIs and reverse engineer their dats requirements to identify and prioritize essential data. The more work you already put into your data collection and preparation up to this point, the more efficient and cost-effective your AI projects will be down the road. Also consider how data is typically presented to your clientele. It is not uncommon for highly complex data sets to be prepared and modeled and then presented to the customer in a miserable manner. You may want to align your data preparation with customer preferences.
4. Document your data
As touched on earlier, documenting data sets is a critical component that you cannot afford to neglect. In the last three to five years, it has become clear that there is an increased sensitivity to data sovereignty and privacy around the world. With the GDPR, the EU has also ratified a significant tool that can result in stiff penalties for companies. Therefore, you should urgently consider such laws in your data strategy right from the start and align data collection and processing accordingly.
Key take aways
- What data sources are available and what is their relevance?
- Which data sources will be relevant in the coming years?
- How can I store unstructured data in the most structured way possible?
- What are the key KPIs of my customers and what data is required for them?
- How is data provided to the organization and other stakeholders?
- How is the documentation of data sets done?
Want to read more? Return to overview