We all know that data is the lifeblood of data analytics, data science and artificial intelligence, thus it is of UTMOST importance that we manage it.
The overarching principles will be having the highest #dataquality possible, but also #data security, privacy and integrity.
Thus planning is important but be realistic, because there is no such thing as “perfectly clean” data.
What are the areas we need planning? Here are a few I can think of.
Collection: Plan to collect the most accurate data possible, taking into account its granularity, useful definition,and collection method.
Storage: How to store the data for ease of retrieval, like indexing and how to secure the data, plus backup as well. Also the schema, and de-normalization for ease of update and reduce the chance of duplicates.
Retrieval: How to structure the datasets, so that retrieval is easier to cover all of the routine reporting and most of ad-hoc analysis (foreseeable, of course). Also who can retrieve and access the data, and the kind of permission as well.
How successful and valuable your data will become greatly depends on your planning and also the accompanying data governance processes. Each company data strategy will be tied to their business strategy, revenue model and market model so, its pretty unique in a way. This means there is a learning curve to move up on which in turn means be prepared to make changes, when needed.
TLDR: Start planning your data strategy! :)