flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <chris.jo...@nomura.com>
Subject Invalid characters in event body
Date Fri, 16 Oct 2015 10:48:05 GMT
Hi all,

I’m trying to ingest data that contains what I think are invalid characters, and Flume is
behaving a bit strangely.  There’s a single agent with a Spooling Directory source, and
 a HDFS sink, ingesting CSV files to be queried with Drill. Whenever Flume attempts to ingest
the bad row, it doesn’t log any error, but instead writes a truncated row to HDFS. Drill
then fails to query any data including this row, as there is a newline in a quoted CSV string.
Is there any way to try and handle this? I wrote a custom interceptor to replace characters
using a regex with ‘\p{C}’, but that didn’t help.

Data in Spooling directory CSV:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","Field���򔪌i�@�V%20�C%20�","Field
5"

Data written to HDFS:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","Field���

Output of ‘cat –v’, which prints unprintable characters:
Data in spooling directory:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","FieldM-oM-?M-=M-oM-?M-=M-oM-?M-=M-rM-^TM-*M-^LiM-oM-?M-=@M-oM-?M-=V%20M-oM-?M-=C%20M-oM-?M-=","Field
5"

Data written to HDFS:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","FieldM-oM-?M-=M-oM-?M-=M-oM-?M-=

Regards,
Chris


This e-mail (including any attachments) is private and confidential, may contain proprietary
or privileged information and is intended for the named recipient(s) only. Unintended recipients
are strictly prohibited from taking action on the basis of information in this e-mail and
must contact the sender immediately, delete this e-mail (and all attachments) and destroy
any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness
of, or the presence of any virus or disabling code in, this e-mail. If verification is sought
please request a hard copy. Any reference to the terms of executed transactions should be
treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves
the right to retain, monitor and intercept e-mail communications through its networks (subject
to and in accordance with applicable laws). No confidentiality or privilege is waived or lost
by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference
to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications
Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm

Mime
View raw message