Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 013E210585 for ; Tue, 27 Aug 2013 15:54:30 +0000 (UTC) Received: (qmail 65425 invoked by uid 500); 27 Aug 2013 15:54:29 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 65327 invoked by uid 500); 27 Aug 2013 15:54:29 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 65312 invoked by uid 99); 27 Aug 2013 15:54:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Aug 2013 15:54:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.217.177] (HELO mail-lb0-f177.google.com) (209.85.217.177) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Aug 2013 15:54:24 +0000 Received: by mail-lb0-f177.google.com with SMTP id p5so2666599lbi.8 for ; Tue, 27 Aug 2013 08:53:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=8iKzJDDx+6549KPOWn7cmQcenjJneaONGCLCTuq2rZ8=; b=IBQaBznhaLq7klWUKGT2W1cRlPbfIwimuCoLa3dU2/vf0AooWyOJVdDXyq/HiudneH RUvdKB1V3Excn8lkZbdC2kkf6MTrJuQLYZ7+SgRiyjbJ4iEt6Hnbx88DdjoYLUaXNvDI 5hRb0LXVFwlp8ucL/RJXWx6lrfJGGrjQNlL2564CcGg/gN97mczpwt0XJOCAfG9d9rC+ 6IGuivV6zVtBEQ4DVVeMfxTnceCP9ScNjepvUybRBtpkO8p0heSTQblqr2Dpm1b3gFRB 66SRSXrEULx0Tj7/jlJzJ+fJtXW8K1Kg3B/bzqqFWOyU4/KqQKmOK5kP41umc4qbCZmI DU4A== X-Gm-Message-State: ALoCoQnpZeTcR8XQyTdg/K3blg6BMfL9sQncmww2PhCnQZxP4JDtH4l9ZuyQtoDEgtGH4JNQpo+d MIME-Version: 1.0 X-Received: by 10.152.2.201 with SMTP id 9mr19218042law.20.1377618822452; Tue, 27 Aug 2013 08:53:42 -0700 (PDT) Received: by 10.112.147.102 with HTTP; Tue, 27 Aug 2013 08:53:42 -0700 (PDT) X-Originating-IP: [192.195.66.4] In-Reply-To: References: Date: Tue, 27 Aug 2013 11:53:42 -0400 Message-ID: Subject: Re: Events being cut by flume From: Israel Ekpo To: user@flume.apache.org Content-Type: multipart/alternative; boundary=089e013c67608c66d904e4efe0a9 X-Virus-Checked: Checked by ClamAV on apache.org --089e013c67608c66d904e4efe0a9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The default value for the available memory specified in $FLUME_HOME/bin/flume-ng is very small (20MB) So, in your $FLUME_HOME/conf/flume-env.sh file Try increasing your Java memory to a higher number (at most 50% of the available RAM) JAVA_OPTS=3D"-Xms4096m -Xmx4096m -XX:MaxPermSize=3D4096m" Then, in your agent configuration file: Increase the maximum number of lines per event to a much higher number (like 5000). Also change the output encoding to UTF-8 Let's make sure that the input encoding matches the encoding of the original event. This can cause problems if it is not the right one. Let's see if these changes make a difference. *Author and Instructor for the Upcoming Book and Lecture Series* *Massive Log Data Aggregation, Processing, Searching and Visualization with Open Source Software* *http://massivelogdata.com* On 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ wrote: > Hi Israel, > > thanks for your response. We already checked this, doing :set list with > vi editor our events look like this: > > "line1field1";"line1field2";"line1fieldN"*$* > "lineNfield1";"lineNfield2";"lineNfieldN"*$* > > There are not event delimiters*($)* between fields of an event. > I have tried forcing the encoding(because I believe this files, that are > generated by our customer, are converted from ascii to utf-8 by BOM and > they could contain characters with more bytes that the expected one): > > *agent.sources.rpb.inputCharset =3D UTF-16* > *agent.sources.rpb.deserializer.maxLineLength =3D 250* > *agent.sources.rpb.deserializer.outputCharset =3D UTF-16* > > but if i use a *maxLineLenght* of this size(250) then lot of events are > truncated(event the max characters per line are 250): > *13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length > exceeds max (250), truncating line!* > > if I take a look into the generated file, there are unrecognized > chacarters: =EF=BF=BD=EF=BF=BD and events have been cut in a random way(t= here are lines > with only 3 characters). > > I have tried increasing the maxLineLenght parameter but I end getting a > java heap space exception :( > > Again, thanks. Any help will be very appreciated. > > > > De: Israel Ekpo > > Responder a: Flume User List > Fecha: martes, 27 de agosto de 2013 16:29 > > Para: Flume User List > Asunto: Re: Events being cut by flume > > Hello Zoraida, > > What sources are you events coming from? > > I have a feeling they are coming from SpoolingDirectory and the events > contains newline characters (even delimiter). > > If this is the case, you are going to see the events split up whenever > the parser encounters the delimiter. > > > *Author and Instructor for the Upcoming Book and Lecture Series* > *Massive Log Data Aggregation, Processing, Searching and Visualization > with Open Source Software* > *http://massivelogdata.com* > > > On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ wrote: > >> >> Hello, >> >> I am having some weird problem while processing events coming from a >> file with this format: >> UTF-8 Unicode (with BOM) English text, with CRLF line terminators >> >> Some of the events in the file contain this text: "Mar=C3=A9s". While s= ome >> events are sent correctly without begin cut by flume, there are others t= hat >> arrive incomplete. And even more, the process of sending more events (on= ce >> one event has been cut) stops. We end with incomplete files on HDFS. We >> have isolate the problem: trying with roll file sink instead of HDFS , >> removing all the interceptors, etc. However, we still have the same >> problem. Apparently, the troublesome event does not have any hide weird >> character and files are generated automatically so we would expect that = if >> some malformed input comes from one event, it would come for the others >> too. >> >> We really appreciate any hint that you could give us. >> >> Thanks. >> >> >> >> ------------------------------ >> >> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar >> nuestra pol=C3=ADtica de env=C3=ADo y recepci=C3=B3n de correo electr=C3= =B3nico en el enlace >> situado m=C3=A1s abajo. >> This message is intended exclusively for its addressee. We only send and >> receive email on the basis of the terms set out at: >> http://www.tid.es/ES/PAGINAS/disclaimer.aspx >> > > > ------------------------------ > > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar > nuestra pol=C3=ADtica de env=C3=ADo y recepci=C3=B3n de correo electr=C3= =B3nico en el enlace > situado m=C3=A1s abajo. > This message is intended exclusively for its addressee. We only send and > receive email on the basis of the terms set out at: > http://www.tid.es/ES/PAGINAS/disclaimer.aspx > --089e013c67608c66d904e4efe0a9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The default value for the available memory specified in $FLUME_HOME/bi= n/flume-ng is very small (20MB)

So, in your $FLUME= _HOME/conf/flume-env.sh file

Try increasing your J= ava memory to a higher number (at most 50% of the available RAM)
JAVA_OPTS=3D"-Xms4096m -Xmx4096m -XX:MaxPermSize=3D4096m"

Then, in your agent configuration file:
Increase the maximum number of lines per event to a much highe= r number (like 5000).

Also change the output encoding to UTF-8

=
Let's make sure that the input encoding matches the encoding= of the original event. This can cause problems if it is not the right one.=

Let's see if these changes make a difference.
=


Author and Instructor for the Up= coming Book and Lecture Series
Massive Log Data Aggregatio= n, Processing, Searching and Visualization with Open Source Software


On 27 August 2013 11:13, ZORAIDA HIDALGO= SANCHEZ <zoraida@tid.es> wrote:
Hi Israel,

thanks for your response. We = already checked this, doing :set list with vi editor our events look like t= his:

"line1field1";"= ;line1field2";"line1fieldN"$
"lineNfield1";"= ;lineNfield2";"lineNfieldN"$

There are not event delimiter= s($) between fields of an event.
I have tried forcing the enco= ding(because I believe this files, that are generated by our customer, are = converted from ascii to utf-8 by BOM and they could contain characters with= more bytes that the expected one):

agent.sources.rpb.inputCharset =3D UTF-16
agent.sources.rpb.deserializer.maxLineLength= =3D 250
agent.sources.rpb.deserializer.outputCharset= =3D UTF-16

but if i use a maxLineLeng= ht of this size(250) then lot of events are truncated(event the max cha= racters per line are 250):
13/08/27 17:03:34 WARN serialization.LineDes= erializer: Line length exceeds max (250), truncating line!

if I take a look into the gen= erated file, there are unrecognized chacarters:=C2=A0=EF=BF=BD=EF=BF=BD and= events have been cut in a random way(there are lines with only 3 character= s).

I have tried increasing the m= axLineLenght parameter but I end getting a java heap space exception :(

Again, thanks. Any help will = be very appreciated.


=C2=A0
De: Israel Ekpo <israel@aicer.org>

Responder a: Flume User List <user@flume.apache.= org>
Fecha: martes, 27 de agosto d= e 2013 16:29

Para: Flume User List <user@flume.apache.org>
Asunto: Re: Events being cut = by flume

Hello Zoraida,

What sources are you events coming from?

I have a feeling they are coming from SpoolingDirectory and the events= contains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever= the parser encounters the delimiter.




On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ= <zoraida@tid.es&= gt; wrote:

Hello,

I am having some weird problem while processing events coming from a f= ile with this format:
UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Mar=C3=A9s&quo= t;. While some events are sent correctly without begin cut by flume, there = are others that arrive incomplete. And even more, the process of sending mo= re events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying = with roll file sink instead of HDFS , removing all the interceptors, etc. H= owever, we still have the same problem.=C2=A0Apparently, the troublesome ev= ent does not have any hide weird character and files are generated automatically so we would expect that if some malf= ormed input comes from one event, it would come for the others too.=C2=A0

We really appreciate any hint that you could give us.

Thanks.





Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nu= estra pol=C3=ADtica de env=C3=ADo y recepci=C3=B3n de correo electr=C3=B3ni= co en el enlace situado m=C3=A1s abajo.
This message is intended exclusively for its addressee. We only send and re= ceive email on the basis of the terms set out at:
= http://www.tid.es/ES/PAGINAS/disclaimer.aspx




Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nu= estra pol=C3=ADtica de env=C3=ADo y recepci=C3=B3n de correo electr=C3=B3ni= co en el enlace situado m=C3=A1s abajo.
This message is intended exclusively for its addressee. We only send and re= ceive email on the basis of the terms set out at:
= http://www.tid.es/ES/PAGINAS/disclaimer.aspx

--089e013c67608c66d904e4efe0a9--