hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pallav Jakhotiya <>
Subject Restricting Avro File size/records in Hive
Date Thu, 25 Aug 2016 07:51:10 GMT

We have data in Orc formatted table, we filter certain records and then create an Avro format
hive table using the "insert into" clause.

Our use case is to create smaller avro data files in a hive table that can be passed on to
consumers as a Kafka Message.
Can we restrict the file size in an avro backed hive table while we execute the insert into

One solution we had was to use clustered by, but since the number of records/size is not known
beforehand it becomes difficult to create the number of buckets.

Anything else we can try to restrict this?

View raw message