Governance, Open Source

Innovation Ecosystems

With my recent open source work I was motivated to look at the results of an IBM Academy of Technology innovation ecosystems study I was involved in back in 2008.

Innovation ecosystems are multi-organization ventures that seek to collectively innovate to change their industry.   In the study we focused on successful ecosystems and looked for the common patterns that make them work.

We looked at a variety of ecosystems structures including those with one dominant leader and many smaller partners as well as more peer-to-peer ecosystems.  The case studies came from different geographic regions and industries.

Despite their differences, the success factors of these innovation ecosystems were surprisingly similar:

  • Equity within the ecosystem – mutual benefit being derived by all participants, in line with their contribution.
  • Active engagement from all participants, focused towards a common goal and based on trust.
  • Intellectual Property (IP) policy and other intellectual capital ownership issues negotiated and agreed from the inception of the ecosystem.
  • Adequate resources within each partner to enable them to deliver on their promises.
  • The correct mix of skills across the participants to cover all aspects of the ecosystem’s mission.

Each of these elements need to be present to first create the level of trust necessary to collaborate and then provide the right mix of resources to make progress as a team.

When I look at open source projects, organizations such as the Linux Foundation and the Apache Software Foundation (ASF) help to create the right intellectual property sharing basis for an innovation ecosystem.  They provide basic infrastructure for collaboration and encourage practices that create fair engagement.   The individual project can then focus on their goals, skills, resources needed to deliver their dreams.

Photo: Stromboli viewed from Lipari



Data Lake, Governance

What do you govern?

Governance is a practice that you apply to “something”.  Just like James Watt’s fly-ball governor for the steam engine, a governance program seeks to keep a engine in balance so it works effectively.  This engine may be a process, organization, or flow of information.   The important point is that the target of what you are governing is clearly defined.

Approaches to governance, particularly around a data lake, vary widely due to the different choices that organizations make in their definition of the engine being managed.  For example, the IT department may see the data lake engine as a collection of technology working together.  The business may see the data lake as part of an innovation engine helping them to create new value from data.  So which is the right engine to govern?  It depends on the objective for data lake.

A good starting point in defining the governance program for the data lake is to consider the perspective of  each of the principle groups of users for the data lake and define the engine that each see and think what mechanisms it would take to create balance in each of these perspectives between effort and value.

So for example, the owner of a system that is supplying data to the data lake is required to maintain the catalog entry for the data coming from their system, and in return, they could get analysis on the quality or consistency of this data that helps them provide a better service to their users.

A data scientist may be restricted in how they work with sensitive data, but in return they get a rich catalog of data to choose from and easy processes to get permission to use the data sets they need.  They may also be given the ability to contribute data and content for the catalog.  The more they contribute, the easier the discovery process becomes.

By balancing the needs of the suppliers with the needs of the consumers, the balance of effort and value is achieved, creating a sustainable ecosystem.

In addition to designing the governance program to the perspective of the users, it is also necessary to decide who is in control of the data lake – is it IT or is it the business because that affect how the data lake is governed.

When IT is in control, then normal IT governance can manage many of the aspect of the data lake.  However, when the business is in control, the mechanisms that operate the data lake, and the classification that identify the different types of data, need to be abstracted through services and metadata to create a view of the data lake that makes sense  to the business and can be modified by them as needed.  This view is then mapped to the actual data and technology through the metadata in the catalog and the metadata settings are used by the data lake services to drive the behaviour of the data lake.

Once the engine have been defined, the governance program is designed in the normal way:

  • Setting standards for the metadata, formats and best practices for the data lake.
  • Measuring and monitoring the adherence to these standards and
  • Taking action as appropriate such as managing exceptions, answering compliance questions and modifying the program based on feedback.

I would like to end by emphasizing the importance of feedback in achieving balance and value.  Governance programs must be dynamic and demonstrating the value that they deliver.  The feedback mechanisms should not be forgotten as they enable the governance program to stay relevant to the changing needs to the business which in turn changes the nature of the engines we need to govern.

Photo: Ginger Lilly, Sao Jorge Island, Azores