hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <>
Subject Do you feel a need for schema when querying JSON files in hive?
Date Tue, 11 Apr 2017 09:09:22 GMT

There is a concept of JsonSerDe where you need to specify a structure for
your tables in order to query them.

However, since the schema for an object is prone to change (once every few
months is not unexpected), how do you handle that change in your hive/pig

Moreover, since JSON files are not demarcated according to schema, it is
possible that a single JSON file has json-data for multiple evolutions of a
schema (Like 10 objects of ClassAnimal1, 20 of ClassAnimal2, 100 of
ClassAnimal3 etc where ClassAnimal1, ClassAnimal2 and ClassAnimal3
represent schema for ClassAnimal at different times).

For such a JSON file, what is the recommended way of querying?

I know that Avro solves this problem by maintaining a single file for a
single-kind of schema. So it will have 3 files for the above case, 1 each
for ClassAnimal1, ClassAnimal2 and ClassAnimal3)

But since Avro is binary, hard to debug and requires a schema-repository
(for non-hive use-cases), we were hoping to solve this problem in JSON.

Related questions:
1) Is it even a problem worth solving?
2) How many people use AvroSerDe as compared to JsonSerDe?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message