In order to build a robust enterprise data strategy you must consider the following points: (1) prioritizing business goals, (2) putting architectural issues in perspective with business objectives, (3) and defining the right technology stack. Additionally, this process is iterative as so many complex processes are. You will have to do it several times until you achieve the intended result (which is point 4 on the list).
(1) Prioritizing business goals
Some organizations start with a particular trending technology or concept in mind. Perhaps they have heard about MapReduce, a method to process vast sums of data in parallel, which can magically reduce the amount of data. Or they want to bridge the way for new technologies such as Hadoop or Spark to become a data-driven business in order to find the hidden diamonds in their data centers, that is, the kind of valuable information that could create new use cases, improve existing processes or technologies or even drive sales. This is nearly always easier said than done, since machine learning algorithms lose their effectiveness when processing high-dimensional big data due to the curse of dimensionality, which I may explain in an upcoming post.
Sometimes it is very hard to bring new technologies to big enterprises since the IT processes for managing such technologies are not yet defined. IT departments often claim, "the technology is not in our book of standards." But this is an entirely distinct innovation problem that the company may be afflicted with (what I call the company’s inner immune system). Nevertheless, if you start by talking about technology when you want to set up a data strategy, you will probably follow the wrong path.
Instead, start with the big picture of what you’re trying to achieve as an organization. What are your business goals? A car company, for example, could improve the map data in its navigation systems by leveraging the data collected by the car’s built-in camera. Since traffic signs change at a rate of 10 percent per year, newly detected traffic signs could be sent to the car manufacture’s backend and used to create up-to-date map data, which would be a great USP.
After defining such goals and answering the hard questions, you're ready to talk about an initial technology stack. Since new business objectives require some time to become clearer, your technology stack may need to be updated as well. With this in mind, you have to set up an iterative process. Every two months, you should reflect on current business needs, as well as projected needs 18 to 36 months ahead and match these with the technology stack you have in mind.
Now you are ready to prioritize. Once you have a clear picture of your business objectives, you can sort your projects according to your goals. Then you can be sure that you are concentrating on the most significant problems. To first solve these is crucial since you will usually be restricted by a limited set of resources. Optimization always involves sacrifices. Solving these issues, however, paves the way for a platform which fulfills your data strategy's needs. Every new project will profit from experience gained during previous ones since more capabilities are in place. Therefore, problems can eventually be solved more quickly due to synergic effects.
(2) Putting architectural issues in perspective with business objectives
Large companies usually have long-term contracts with big IT companies such as Oracle or Microsoft, to mention just a couple. Both of them offer BI (Business Intelligence technology) tools like data warehouse solutions. Data warehouses were state of the art in the 1990s and nearly every mid-sized to big enterprise has one in place. However, one of the most common mistakes is trying to rebuild or utilize data warehouses for use cases for which they never were intended, for example, to fulfill the needs of real time stream processing. Data warehouses hardly suit these needs due to their architectural structure. They heavily use virtualization and batch jobs to organize and structure the data. Time-sensitive analytics, however, require not only the CPU power of a host in the cloud without the virtualization's overhead, but also full access to the data transport bus and I/O (Input/Output) ports of the server. On a multi-tenant server, users share those facilities with other users. If the analytic task hogs resources frequently the other users can be shortchanged, or else the Big Data speed layer’s response times is slowed or paused. Understanding the current state of the company’s architecture and its limitations is essential when defining the technology roadmap you need. Based on the status quo with its current limitations, you can make a list of technologies which you need to solve your business problems and thereby put architectural issues in perspective with your business objectives.
(3) Defining the right technology stack
If you select and install tools too early, you run the risk of having it sit idle while you work on uses cases and business objectives. As long as your requirements have not cohered, a freshly installed tool can become obsolete before you use it. Another important point is to forget about any legacy software that you may already have in place. You should resist the urge to rely on any legacy software only because your company has licensed it or your IT department is already comfortable with it. Take this as a great opportunity to define the technology you really need.
Another consideration when selecting tools is their longevity. For your investment to stay relevant amid shifting business needs, it’s important to pick tools which fulfill current and future objectives. To minimize costs, you can start with open source tools for proof of concepts before you invest in commercial software. It is crucial that you instruct your IT department to support the tools you want. If they do not have the skills in place, they can hardly set up support processes and help you during difficult phases such as scaling up.
(4) Document and do it several times
To drive consensus around your data strategy and technology roadmap, you need to have collaborative meetings with the business, technology, and product teams. In such meetings, all stakeholders have to decide jointly on important issues such as mapping business priorities to the technology roadmap.
For documentation purposes and to allow all stakeholders to work on their specific tasks without conflicting with others or duplicating work, there are concepts such as TOGAF (The Open Group Architecture Framework). TOGAF offers you a set of tools organizing and planning complex IT projects. You can use it as an efficient method for discussing and documenting your data strategy. It is typically divided into four layers: Business, Application, Data, and Technology. Your business objectives are documented in the Business Layer. Architectural considerations go into the Application and Data layer. The Technology layer should contain all evaluation paths regarding tool selection.
Another common step is to hire a chief data officer (CDO). To adequately perform his tasks, a CDO needs sufficient reputation and power within the company. He needs to have a clear view of the business landscape as well as a profound knowledge of data science tools and platforms. Additionally, this person must/should also possess strong political and diplomatic skills, since he has to mediate among different stakeholders. The project can only succeed when all of them are willing to cooperate.
The CDO has to find a way to incentivize all stakeholders to support the optimal solution for the company as a whole, rather than prioritizing their own needs. To give a concrete example: When I set up the car fleet, which I explained in my previous post, a variety of departments at Volkswagen were interested in the data. Nevertheless, none of them agreed to contribute to the costs, since their budget was already defined at the beginning of the year. Contributing to the fleet setup would have meant not working on a project which had been planned at the beginning of the year. Additionally, the managers’ bonuses depended on sticking to the budget plan. In my experience individual departments’ goals do not always go in hand with the optimal solution for the company.
After having defined and prioritized your business objectives, sorted your projects and selected your tools of choice, you're still not finished. As you progress towards your goals, you will be continuously re-calibrating based on the experience you gain through your projects. It is a kind of natural feedback loop which should yield your data catalog. You can build up such a catalog by documenting all your work to data sets in an iPython notebook.
Agility, as well as communication are the core ingredients for a successful enterprise data strategy. Technology is only a tool to implement the software you need to achieve your business objectives. Clarify your business goals and re-evaluate them continuously. Based on them, you should prioritize your projects. Let all of this lead you to the architecture which best fits your needs. The journey is the reward.