flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ed <edor...@gmail.com>
Subject Handling malformed data when using custom AvroEventSerializer and HDFS Sink
Date Wed, 01 Jan 2014 02:34:32 GMT

We are using Flume v1.4 to load JSON formatted log data into HDFS as Avro.
 Our flume setup looks like this:

NXLog ==> (FlumeHTTPSource -> HDFSSink w/ custom EventSerializer)

Right now our custom EventSerializer (which extends
AbstractAvroEventSerializer) takes the JSON input from the HTTPSource and
converts it into an avro record of the appropriate type for the incoming
log file.  This is working great and we use the serializer to add some
additional "synthetic" fields to the avro record that don't exist in the
original JSON log data.

My question concerns how to handle malformed JSON data (or really any error
inside of the custom EventSerializer).  It's very likely that as we parse
the JSON there will be records where something is malformed (either the
JSON itself, or a field is of the wrong type etc.).

For example, a "port" field which should always be an Integer might for
some reason have some ASCII text in it.  I'd like to catch these errors in
the EventSerializer and then write out the bad JSON to a log file somewhere
that we can monitor.

What is the best way to do this?  Right now, all the logic for catching bad
JSON would be inside of the "convert" function of the EventSerializer.
 Should the convert function itself throw an exception that will be
gracefully handled upstream or do I just return a "null" value if there was
an error?  Would it be appropriate to log errors directly to a database
from inside the EventSerializer convert method or would this be too slow?
 What are the best practices for this type of error handling?

Thank you for any assistance!

Best Regards,


View raw message