flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rani Yaroshinski <rani.yaroshin...@gmail.com>
Subject Re: Invalid characters in event body
Date Fri, 16 Oct 2015 10:59:52 GMT
I think you can try one of the following :
1.Looking for non alpha numeric characters directly on the source.
2. Check to see weather there is no limitation on the length of line in the
configuration.
3. Change the serializer to binary and check, or try using a custom
serializer with another eol character.
On 16 Oct 2015 13:48, <chris.johns@nomura.com> wrote:

> Hi all,
>
>
>
> I’m trying to ingest data that contains what I think are invalid
> characters, and Flume is behaving a bit strangely.  There’s a single agent
> with a Spooling Directory source, and  a HDFS sink, ingesting CSV files to
> be queried with Drill. Whenever Flume attempts to ingest the bad row, it
> doesn’t log any error, but instead writes a truncated row to HDFS. Drill
> then fails to query any data including this row, as there is a newline in a
> quoted CSV string. Is there any way to try and handle this? I wrote a
> custom interceptor to replace characters using a regex with ‘\p{C}’, but
> that didn’t help.
>
>
>
> *Data in Spooling directory CSV:*
>
> "13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","Field���򔪌i�@�
> V%20�C%20�","Field 5"
>
>
>
> *Data written to HDFS:*
>
> "13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","Field���
>
>
>
> *Output of ‘cat –v’, which prints unprintable characters:*
>
> *Data in spooling directory:*
>
> "13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field
> 3","FieldM-oM-?M-=M-oM-?M-=M-oM-?M-=M-rM-^TM-*M-^LiM-oM-?M-=@M-oM-?M-=V%20M-oM-?M-=C%20M-oM-?M-=","Field
> 5"
>
>
>
> *Data written to HDFS:*
>
> "13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field
> 3","FieldM-oM-?M-=M-oM-?M-=M-oM-?M-=
>
>
>
> Regards,
>
> Chris
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>

Mime
View raw message