Nexla is happy to announce support for Apache Parquet, a free and open-source column-oriented data store. With this release, companies can now convert nearly any data into Parquet for highly optimized, cost-effective queries in Amazon Athena and Redshift Spectrum.
Nexla’s revolutionary platform can connect to almost any data source containing CSV, XML, EDI, Avro, JSON or any arbitrary text-delimited data, transform it using an intuitive point and click interface, and convert it to Parquet. This allows Nexla users to immediately start querying and gaining insights on data with Amazon Athena and Redshift Spectrum— without data engineering effort.
Maximizing the Benefits of Parquet
This has important implications for companies leveraging Amazon Athena or Redshift Spectrum, which both allow running queries right out of files on S3. While these technologies support multiple file formats, using Parquet has a significant cost and performance benefit. Both Athena and Redshift Spectrum pricing depends on the amount of data scanned for executing a query. For example, if a query runs across 1TB of CSV files and performs a sum on one of the 20 columns, it scans all the files. It bills the user for the full 1TB of data— even though only a fraction of data was relevant to computing the result.
“If the same data was in columnar format such as Parquet, then it only scans the relevant column from the files,” explained Saket Saurabh, CEO & Co-Founder of Nexla. “We are pleased to offer companies the ability to automatically convert data into Parquet, resulting in 45% lower query costs, on average.” Nexla can perform additional optimization by appropriately partitioning data so that, once again, it only scans the relevant data for a query. Nexla’s ability to automatically combine incoming data with disparate structures into a single Parquet format without custom code saves time and money.
Enabling Faster Analytics
Looker, a data analytics platform and Nexla partner, connects directly to Redshift Spectrum and Athena to perform analytics in-database. “Nexla’s ability to convert many data formats into Parquet enables Looker users to run faster, more advanced queries on Athena and Redshift Spectrum,” said Dillon Morrison, Looker’s Data Platform Lead.
“Now, rather than needing a data engineer to maintain ETL scripts and an EMR cluster, data analysts can convert data to Parquet in Nexla’s graphical user interface, meaning a wider audience can leverage the speed of Parquet. In several use-cases, we’ve seen query speed increase at least 25x! We’re thrilled to work with Nexla’s DataOps platform to help our customers drive more advanced business insights with Looker,” Morrison continued.
Parquet conversion is available now in the Nexla inter-company Data Operations platform.
Check out CRN Magazine’s exclusive feature here