Data Engineers: The New Kings and Queens of Artificial Intelligence

Jarah Euston
Jarah Euston

In 2017, the data engineer will be king (or queen).

Historically, data operations responsibilities have typically been seen as “grunt work,” with little appreciation for their complexities. There is repetition in data operations tasks, but like snowflakes, no two projects are ever exactly the same. Thankless tasks such as integrating APIs, running ETL jobs, and scrubbing data are necessary before any machine learning or AI can happen. Under-investment and a lack of appropriate tools forces data engineers to take an ad-hoc approach to data operations that invariably requires more maintenance down the line. While not an elegant solution, this arrangement has worked– until now.

Machine learning and AI’s demands for data are such that scaling with human engineers simply won’t be possible. Companies have invested billions in transformative technologies to store and analyze data, and are investing billions more in building AI capabilities. It’s now time to invest in the technology that feeds this data-hungry machine. Much like a rocket trying to exit Earth’s orbit, 90-95% of AI is the data that fuels it.

The Data is About to Hit the Artificially Intelligent Fan

Three key trends are converging to make data operations more important than ever.

  1. The exponential growth of data creation will continue, with new data types emerging from IoT, connected cars, and more. Sensors are being integrated into everything from refrigerators to retail stores.
  2. The ongoing evolution from descriptive analytics to machine learning and AI will accelerate, creating the need for more data integration to feed models. A model is only as good as the data that is inputted, and the old adage “garbage in, garbage out” is more true now than ever.
  3. It’s a dirty little secret that unless you’re Google, Facebook, or Amazon, the data needed for sophisticated machine learning models and AI is likely to exist outside of your company. Just think about it: Let’s say you work for an auto insurance company. It sure would be great to get your hands on some connected car data to enhance your models, right? Or let’s say you’re a brand, and you’d like to know how much foot traffic circles your merchandise in stores to better understand the customer journey. In either of these cases, do you produce the data you need? No! As AI advances, more and more data will need to move. And that will place untenable demands on the data operations function, unless it is modernized.

The Data Engineer is King of the DataOps Castle

Enterprises need to elevate data operations to a core function within the IT, business intelligence, or data organization. Investment in tools and systems today to support future data needs is critical if the promise of AI and machine learning is to be realized.

We’ve seen this movie before. Ten years ago, engineers were doing repetitive tasks with javascript. The solution wasn’t more javascript engineers, but a framework to automate repetitive tasks and make javascript easier to develop. The same thing happened with mobile apps. When the iOS app store launched in 2008, app developers had to roll their own code, not just for the app but for things like analytics, crash monitoring, and push notifications. Today there are thousands of tools to make an app developer’s life easier.

The time is ripe for investing in these data operations tools at your company. With software in place to help manage data operations, data engineers can focus on building the right infrastructure and architectures that your core business requires. With help to plow through most rote tasks, output (and talent) from the data operations team will be much more effective.

The Pick Axe in the AI Gold Rush

Data operations is the pick axe in the AI gold rush. Without the right data and the equipment to mine it, the promise of AI for many companies will be left unrealized. This is especially true for companies in finance, retail, healthcare, and more where you create value with the algorithms and analysis you do on data and not how you access and manage it. These companies would be wise to work with trusted software partners to build up their data operations teams. The data challenges that come with AI can’t be solved by more job listings. We’re going to need real technology to help our data engineering kings and queens process the next 180 zettabytes.