avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Rodriguez <df.rodriguez...@gmail.com>
Subject Create Avro from bytes, not by fields
Date Fri, 07 Feb 2014 20:06:29 GMT
Hi all,

Some context (not an expert Java programmer, and just starting with
AVRO/Flume):

I need to transfer avro files from different servers to HDFS I am trying to
use Flume to do it.
I have a Flume spooldir source (reading the avro files) with an avro sink
and avro sink with a HDFS sink. Like this:

           servers                      |                  hadoop
spooldir src -> avro sink     -------->       avro src -> hdfs

When Flume spooldir deserialize the avro files creates an flume event with
two fields: 1) header contains the schema; 2) and in the body field has the
binary Avro record data, not including the schema or the rest of the
container file elements. See the flume docs:
http://flume.apache.org/FlumeUserGuide.html#avro

So the avro sink creates an avro file like this:

{"headers": {"flume.avro.schema.literal":
"{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
"body": {"bytes": "{BYTES}"}}

So now I am trying to write a serializer since flume only includes an
FlumeEvent serializer creating avro files like the one above, not the
original avro files on the servers.

I am almost there, I got the schema from the header field and the bytes
from the body field.
But now I need to create write the AVRO file based on the bytes, not the
values from the fields, I cannot do: r.put("field", "value") since I don't
have the values, just the bytes.

This is the code:

File file = TESTFILE;

DatumReader<GenericRecord> datumReader = new
GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<GenericRecord>(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
    user = dataFileReader.next(user);

    Map headers = (Map) user.get("headers");

    Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
    String schema = headers.get(schemaHeaderKey).toString();

    ByteBuffer body = (ByteBuffer) user.get("body");


    // Writing...
    Schema.Parser parser = new Schema.Parser();
    Schema schemaSimpleWrapper = parser.parse(schema);
    GenericRecord r =  new GenericData.Record(schemaSimpleWrapper);

    // NOT SURE WHAT COMES NEXT
}

Is possible to actually create the AVRO files from the value bytes?

I appreciate any help.

Thanks,
Daniel

Mime
View raw message