The Path To Software Success Starts With A Data Roadmap

Tags: , ,

This XSeed blog was written by Venture Partner, Jeff Thermond, and originally appeared on Forbes.

LinkedIn and other companies draw hundreds of millions of users every day in large part thanks to their ability to acquire, transform and leverage many types of data. The use of a data roadmap, which is a disciplined way to think through how to capture, create and transform data, is a technique gaining popularity that helps entrepreneurs to mimic this aspect of the LinkedIn success story. I recommend strongly that you study this approach and have outlined the five key levels in a data roadmap here.

Contrary to a product roadmap, which expands along a horizontal time dimension, the data roadmap is defined by a vertical axis to indicate that data accumulates more value as it progresses through what I call the Five Stages of Data Transformation:

1. Extraction

2. Curation

3. Derivation

4. Combination

5. Self Generation

Like Maslow’s Hierarchy of Needs, it starts at the bottom and progresses to the top. Great entrepreneurs exploit each stage for maximum effect. LinkedIn certainly has done this and has been well rewarded.

Extraction is pretty easy to understand. It involves pulling data from user input forms or with machine learning from user data in a structured extraction process. The screens where you build your LinkedIn profile are part of its Extraction layer. It is worth noting that when you type in your employers, LinkedIn also extracts prior knowledge about your company they got from other users and public sources. Extraction is a more comprehensive process than just taking in user-typed input.

It is amazing to me how many ideas I see where the entrepreneur’s thinking stops at this stage and goes no further. For almost any business idea, I think 80% or more of the data’s value is developed after the Extraction stage. If you have a fully developed data roadmap, it’s almost impossible not to see this.

Curation is the stage that corrects transcription errors (assuming human input), resolves ambiguous relationships between entities (especially a problem using natural language processing), and other tough data accuracy problems. This is a very hard stage to complete with 97%+ accuracy. But with that level of accuracy, your data is trustable. Trustable data is the foundation for the next three stages of data transformation. Relating back to building your LinkedIn profile, if you try to enter an incorrect date range in your LinkedIn profile, it will alert you. If you were to try and claim a ‘Colleague’ connection with someone with whom you did not have overlapping companies, you would also get an error. Once you have a large and growing body of highly accurate data, you have a tremendous asset.

Derivation is what happens when your well-curated data fields are combined, which starts creating novel and valuable new data fields. A very important part of LinkedIn data is its knowledge about over 300 million users’ connections. The ‘People You May Know’ section is all about deriving connections from the companies you worked at in your curated profile. LinkedIn not only looks at your connectedness with them, but also their contacts and potential connectedness with you. For LinkedIn, the value is their comprehensiveness. It’s a classic network effect. Your LinkedIn Economic Graph is a great example of derived data.

Combination is the juxtaposition of data from multiple sources to produce additive insights only accessible when all the data is present at once. This is a stage that benefits from thinking about third party data sources and how they might be combined with yours to create something special. LinkedIn clearly links to company web profiles, web links particular to you, third party blogs once you say you’re interested in that company, etc.

Self Generation is the last state and mostly occurs when the data traffic is growing very rapidly. In this transformation, metadata about usage of the data itself starts to become valuable because of the scale of the usage. Two examples of this would be tagging your own behavior (e.g., whom you looked at) as well as tagging popular discussion topics. When you’re at a volume of user data where usage data itself becomes interesting, you know that you’ve created something really valuable.

So why should you go to all of this trouble? After all, didn’t I just say that most entrepreneurs don’t jump through these hoops? I think there are three really good reasons:

Firstly, you’ll stand out from the crowd both when you’re raising money from investors, as well as when you’re selling to new prospects. You’ll have a real, unique, well thought-out vision. It’s far more fun to pitch your product with a thoroughly developed data roadmap.

Secondly, it creates a moat. That moat can take several forms. It’s highly likely that what you do in the Curation and Derivation stages are patentable. It creates a wide canvas on which to paint compelling brands and messaging. It is also quite easy to build really compelling competitive comparisons about the breadth and accuracy of your well-transformed data versus the slapdash approach of a competitor who has not completed the data transformation process.

Thirdly, you’ll only know you’ve gotten the full potential value from your data if you go through all five phases of a data roadmap. Just to pick an example, in the Combination stage, you’re forced to think about public or third party data sources that can be combined with yours to create new insights or data. Explicit use of a Data Roadmap forces you to think about it and seize new opportunities.

Every company is in the data transformation business today. Those that know this and manage the transformation of their data explicitly through all five stages of the data roadmap will have a huge, unfair advantage over those who don’t. Data businesses can be worth phenomenal amounts of money, and they can be a source of meaningful differentiation that lasts for years.

If you haven’t got a data roadmap, get on it now.