CORD-19 research data in different formats – Avro, Parquet, JSONL

We welcome researchers who are leveraging data to fight the Coronavirus. A rich set of data has been main available in the COVID-19 Open Research Dataset (CORD-19) provided by Allen Institute for AI. We are helping researchers by making data available in multiple formats. To learn more about different data formats and their benefits read our white paper, Introduction to Big Data Formats: Understanding Avro, Parquet, and ORC.

AVRO Download  : cord19_all_data_in_avro.tgz 517.4MB

Parquet Download : cord19_all_data_in_parquet.tgz 485.2MB

JSONL Downloads : cord19_all_data_in_jsonl.tgz  660.1MB

Would it be helpful for you to get data in any other format, such as ORC? Email us and let us know at