Let’s Talk DataOps: Jon Loyens, Co-founder and CPO of data.world

Jarah Euston
Jarah Euston

dataops work pull quote

Welcome to our new interview series, “Let’s Talk DataOps!”

DataOps (Data Operations) is an emerging function that controls the flow of data from source to value. This includes integrating with data sources, performing transformations on the data, converting formats, and writing or delivering data to its required destination so it can be analyzed, inputted into a model, or surfaced for end users. DataOps also encompasses the monitoring and governance of data flows, while ensuring security and scale.

In this new series we’ll be interviewing the leading thinkers in Data Operations, to discuss the state of DataOps from their point of view. Learn about what they do, their biggest challenges, and how they are utilizing DataOps to drive their businesses.

For our inaugural interview, we spoke with Jon Loyens, the Co-Founder and Chief Product Officer of data.world. As a long-time technology executive in Austin, Jon has lived the rise in the trend of data and analytics as a democratizing force. In a past life, he was the VP of Engineering for Traveler Products at HomeAway, and before that, a VP of Engineering and Director of the Labs group at Bazaarvoice.

Building the World’s Most Abundant Data Resource

Nexla: You’re the CPO for data.world. Tell us about what you’re building Jon Loyens: Our mission is to build the world’s most meaningful, collaborative, and abundant data resource. We are building an environment where people can collaborate on data and bring data together. People haven’t put a lot of thought into the process of data work itself and people are spending too much time on it. We’d like to make that better.

Stop emailing spreadsheets.

Data work can be kind of a dirty word, and we want to bring social back into data work. A lot of knowledge gets lost in tribal communication patterns and Excel sheets. For example, one csv file can have many people asking what certain columns mean, through either hallway conversations or Slack chats. Building a platform where people can collaborate, and all the knowledge is in one place, allows people to answer questions more quickly.

N: Data.world is a powerful platform for anyone who is looking for data or wants to share it. What are some of the biggest uses of this type of data sharing today?

JL: Data.world has been able to bring people together from different disciplines. The Data Democracy community is a very active community with public policy experts, lawyers with computer science degrees, programmers with physics and statistics backgrounds— everyone is from a variety of places trying to apply data-driven decision making and data discovery to have better discourse about the state of our country.

In the public sphere, data.world has been working with local and international journalists. These journalists come together to provide help and collect data for their stories. Data.world also applies to smaller groups too. Much like open source software revolutionized the way people approached software engineering, a lot of techniques we use are easily applicable to smaller teams like people working across cubicles in businesses.

I want people to stop emailing spreadsheets. In the business context, with any kind of structured data, we instantly give you the ability to create in SQL, combining data in adhoc ways from different silos. Data.world eliminates that emailing of spreadsheets around while allowing you to work more efficiently with your colleagues, across a cube or a across the globe.

Managing and Moving Data

N: How do you think companies will manage so many data sets?
JL: The world has really evolved in infrastructure and around data analysis. Twenty years ago, it was all about data warehouses, data management processes, master data models for the whole company, giant pipelines, and data cubes. The problem with those, is it results in a very top down structure. In the modern world, you’re dealing with the 3 V’s of data (variety, velocity, and volume). With IoT, powerful web analytics, and SQL data spaces, the global reaction to data warehousing and data management is the data lake. The concept of the data lake allows you to put anything you want in there, but then the problem is now bottoms up. How do you know what’s in there? What can you use?

Take for example, a financial institution. How do you govern something like that? I think companies need to adopt a bottoms up approach. We want you to be able to put anything and add context to it in a bottoms up way, solving the 3V problem. It lets you respect the analysts’ want of bringing in 3rd party data, but not losing it in a black hole in the data lake.

N: With so much data, it is not always efficient to have to move data, and you want to be able to provide access and let the data stay. Do you think data inherently wants to be moving or should it be nicely organized in a data warehouse where people can have access to it?

JL: Our platform is built on the idea of federation. The ability to federate queries out and across the semantic web. So the answer is kind of both. Data does want to move in terms of transmitting data all the time across the wire. It goes into different places, but moving huge amounts of data becomes inefficient due to bandwidth concerns. The well of data then becomes so deep it becomes really hard to use. Either way you end up with derived data, creating some subset of data that also becomes data itself and that tends to want to move.

Take sports scores, for example. You can generate a log of every play in a game, and from that log you can generate player stats, scores, and then from a series of games, generate aggregated stats. How you create and reproduce data matters: those things want to be transported, moved, and emailed around.

The Future of DataOps

You want to democratize as much as you can,

so how do you build platforms that allow people to operate?

N: How would you describe Data Operations (DataOps)?
JL: I was previously at HomeAway, and they acquired a lot of data from vacation rental websites. Through that, they built a strong team of data operations people who didn’t work specifically under the name of data operations, but they were a mix of engineers that made sure all transactional data was consistent and verifiable, but also did a great job of maintaining global schemas in a democratized way.

The team was extremely valuable, the same way DevOps democratized operations. You want to democratize as much as you can, so how do you build platforms that allow people to operate?

N: Data.world is seeking to become the Github for data. But we’re curious, what has been your biggest data challenge at data.world so far?

JL: There have been tons of data-oriented product challenges. How do we present something— build Github for data— while representing something as multi-dimensional and multi-faceted? Data is multidimensional and code is inherently linear. We had to define that model from scratch.

N: Prediction time! Ten years ago, “the cloud” was born. As you look into your crystal ball, what will be the biggest changes to data in the next 10 years?

Don’t think that because you have your own data within

your own four walls that it’s more inherently safe or secure.

JL: Safety, security, and privacy are going to be the biggest challenges over the next 10 years. While we very successfully moved a lot of compute into the cloud, people are still reticent about putting data there. A few years ago with the Target breach, and now with Equifax, we need to take security very seriously. Don’t think that because you have your own data within your own four walls that it’s more inherently safe or secure. Over the next 10 years, the cloud needs to get this right, and it’s a big challenge from both the public opinion and analytics standpoint.

Thank you for reading. If you enjoyed this content, please consider sharing it!  Subscribe to our blog to make sure you never miss a post.