JSON to Parquet Conversion

Creating parquet files is now part of the optimization process to improve the query performance in Spark.  It is useful to store the data in parquet files as way to prepare data for query.

JSON is a popular form in web apps.  NoSQL databases, such as MongoDB, allow the developers to directly store data in the format such as JSON to maintain the nested structure.  This way the OLTP apps development and performance can be optimized.

The remaining challenge is to convert the JSON files as parquet files.

Continue reading