avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <busbey+li...@cloudera.com>
Subject Re:
Date Mon, 17 Mar 2014 20:17:46 GMT
Hi Shaq!

Could you describe your use case in more detail?

Generally, HDFS will behave poorly in the face of many small files. Could
you perhaps colocate several data in one file? This will help both with the
relative overhead of the schema and the pressure on the HDFS NameNode.


On Mon, Mar 17, 2014 at 2:55 PM, Salman Haq <shaq.haq@audaxhealth.com>wrote:

> Hello,
> I'd like to confirm if there is a recommended way to serialize data to a
> file but without the schema being written in the file metadata. Assume a
> reader's schema will be available for deserialization at a later time.
> My use case requires small-sized datum messages to be serialized and
> copied to HDFS. The presence of the schema in the message file adds
> considerable overhead relative to the size of the datum itself.
> Thank you,
> Shaq

View raw message