JSON to Parquet Conversion

Creating parquet files is now part of the optimization process to improve the query performance in Spark.  It is useful to store the data in parquet files as way to prepare data for query.

JSON is a popular form in web apps.  NoSQL databases, such as MongoDB, allow the developers to directly store data in the format such as JSON to maintain the nested structure.  This way the OLTP apps development and performance can be optimized.

The remaining challenge is to convert the JSON files as parquet files.

This is not a unique problem.  This will be a common problem for the next generation of OLAP apps to support the next generation of the OLTP platform.

I think that Kite SDK may have provided a solution for addressing this problem.

Create parquet files

The parquet files can be created using the Kite CLI create command:

<kite-dataset create users --schema user.avsc --format parquet

Read JSON files

The files stored in HDFS can be recognized by generating the Avro schema files. We can use Kite CLI json-schema command:

kite-dataset [-v] json-schema <sample json path> [command options]

Leave a comment