avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From diplomatic Guru <diplomaticg...@gmail.com>
Subject Why Avro file format is larger than CSV?
Date Fri, 19 Sep 2014 10:31:22 GMT
I've been experimenting with MapReduce job using CSV and avro format. What
I find it strange is that Avro format is larger than CSV.

For example, I exported some data in CSV, which is about 1.6GB. I then
wrote a schema and a MapReduce job to take that CSV and serialize and write
the output back to HDFS.

When I checked the file size of the output, it was 2.4GB. I assumed that
the size would be smaller because it convert the data into binary but I was
wrong. Do you know what the reason is and refer me to some documentation on

I've checked the .avro file and I could see that header contains the schema
and the rest are data blocks.

View raw message