Return-Path: X-Original-To: apmail-incubator-flume-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-flume-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9DBC62BB for ; Thu, 21 Jul 2011 11:49:27 +0000 (UTC) Received: (qmail 16371 invoked by uid 500); 21 Jul 2011 11:49:27 -0000 Delivered-To: apmail-incubator-flume-user-archive@incubator.apache.org Received: (qmail 16137 invoked by uid 500); 21 Jul 2011 11:49:23 -0000 Mailing-List: contact flume-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: flume-user@incubator.apache.org Delivered-To: mailing list flume-user@incubator.apache.org Received: (qmail 16104 invoked by uid 99); 21 Jul 2011 11:49:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 11:49:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jon@cloudera.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-iw0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 11:49:15 +0000 Received: by iwn4 with SMTP id 4so821771iwn.6 for ; Thu, 21 Jul 2011 04:48:55 -0700 (PDT) Received: by 10.231.91.13 with SMTP id k13mr108036ibm.180.1311248935139; Thu, 21 Jul 2011 04:48:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.219.65 with HTTP; Thu, 21 Jul 2011 04:48:35 -0700 (PDT) In-Reply-To: <08c40e69-6ccf-424c-a20c-cc075c7fe2dd@b21g2000yqc.googlegroups.com> References: <0018ce1c-7a73-4393-af71-eab3bbe94e39@17g2000prr.googlegroups.com> <9c185fcb-1695-48d7-9b28-acf3b1a028f2@j23g2000yqc.googlegroups.com> <08c40e69-6ccf-424c-a20c-cc075c7fe2dd@b21g2000yqc.googlegroups.com> From: Jonathan Hsieh Date: Thu, 21 Jul 2011 04:48:35 -0700 Message-ID: Subject: Re: HBase Sink To: Dennis Cc: flume-user@incubator.apache.org Content-Type: multipart/alternative; boundary=bcaec5430a72fdba6904a892ee47 --bcaec5430a72fdba6904a892ee47 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable [Please subscribe to new flume-user@incubator.apache.org list, bcc flume-user@cloudera.org, cc flume-user@incubator.apache.org] Dennis, These empty messages are added by the E2E agent. When using the hbase sink= , it should be used with the collector { } wrapper. It handles acks delivery and should remove the empty body messages. collector(xxx) { hbase(yyy) } If you are doing this and the "AckType":"end" make it through, there may be a problem. Jon. On Mon, Jul 4, 2011 at 8:10 AM, Dennis wrote: > Thanks for your help. Right now I've got a working Log4j -> Agent -> > Collector -> HBase Setup running. There are still some things to iron > out, but it seems to work. > > Something i've noticed: > > The Log4jAppender just sends the MSG Body and Logpriority. No further > messages like time, date, loglevel and more which are defined in the > log4jconfig. > Also I've got many empty between my valid entries within hbase. Notice > the last entry with an entry body. I suspect it is generated by the > AgentSink. > > {"body":"1232 342","timestamp":1309789917614,"pri":"WARN","nanos": > 1309789917614716000,"host":"10.0.0.30","fields": > > {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg= ","AckChecksum":"\u0000\u0000\u0000\u0000=C2=BFT=C3=B0=E2=80=9D"}} > {"body":"1232 343","timestamp":1309789919619,"pri":"WARN","nanos": > 1309789919619332000,"host":"10.0.0.30","fields": > > {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg= ","AckChecksum":"\u0000\u0000\u0000\u0000=C3=88S=C3=80 > \u0002"}} > {"body":"1232 344","timestamp":1309789921623,"pri":"WARN","nanos": > 1309789921623857000,"host":"10.0.0.30","fields": > > {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg= ","AckChecksum":"\u0000\u0000\u0000\u0000V7U=C2=A1"}} > {"body":"1232 345","timestamp":1309789923628,"pri":"WARN","nanos": > 1309789923628266000,"host":"10.0.0.30","fields": > > {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg= ","AckChecksum":"\u0000\u0000\u0000\u0000! > 0e7"}} > {"body":"","timestamp":1309789925101,"pri":"INFO","nanos": > 4244446597812,"host":"ubuntu","fields": > > {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"end= ","AckChecksum":"\u0000\u0000\u00010=C3=93=C3=8D=C2=A6=E2=80=B9"}} > > On Jun 28, 10:20 pm, Jonathan Hsieh wrote: > > Dennis, Himanshu, > > > > You'd need to pull in the data (lets say via text or tail sources), > parse > > out parts of the line, and then feed it to a hbase sink. > > > > It would roughly look like this: > > > > node: tail("file") | regexAll("(\\w+)\\s+(\\w+)", "row", "data") > > collector(300000) { hbase("table", "%{row}", "cf", "qual", "%{data}") } > > > > tail is the source. > > > > regexAll (an 0.9.4 feature) would pull out and add a "row" and "data" > > attribute out of the event's body. if you are using 0.9.3, you'd need t= o > > have two regex expression -- one to pull out the "row" and the "data". > > > > The hbase sink writes to table "table" using the extracted value of "ro= w" > as > > the row key > > > > Finally, you'de wrap with a colllector constructor so that it can to > handle > > retries on failure (if hbase goes down) and acks (if agents potentially > go > > down). > > > > Jon. > > > > > > > > > > > > > > > > > > > > On Sun, Jun 26, 2011 at 12:23 PM, Dennis > wrote: > > > Have you figured out how this is to accomplished? I'm trying to pipe > > > it through the RegexAll engine to write everything down into an > > > separate column. > > > > > On May 4, 8:10 am, Himanshu wrote: > > > > Hey Jonathan, > > > > > > Answer to your question: > > > > 1. I have just tried for a particular event for which it happens > every > > > > time, i have described my event below. > > > > 2. I am not using metadata as rowKey, I am trying to get part of bo= dy > > > > as rowKey and other part as value. > > > > > > I will just try to explain my case to you, I am using text file as > > > > source, > > > > source defined as text("file.text") the file has two columns there > > > > headers > > > > are 'row' and 'data'. like... > > > > > > row data > > > > 123 abc > > > > 234 def > > > > > > and I described sink as > > > > hbase("test","%{row}","cf","qual","%{data}", writeBufferSize=3D"10"= , > > > > writeToWal=3D"true") > > > > > > here I am trying to insert 'row' column from text file as rowKey an= d > > > > 'data' column as value for > > > > columnFamily. Pls suggest it how can it be done. > > > > And one more thing if I want to use metadata data as rowKey say > 'part > > > > of date' as you stated > > > > earlier. How I can do this. > > > > > > Thanks & Regards > > > > Himanshu > > > > > > On May 4, 10:40 am, Jonathan Hsieh wrote: > > > > > > > Hey Himanshu, > > > > > > > This version of hbase sink is pretty new so there might be some > rough > > > edges. > > > > > > > Some questions: > > > > > > > Does this happen for all events or just for some? > > > > > > > Do you have a metadata field called "row" and another called "dat= a" > in > > > your > > > > > event? For row, I would guess that you have some sort of regex > > > extractor > > > > > adding "%{row}" metadata data, or using the "parts of date" > extractors. > > > The > > > > > value/"%{data}" part I would think normally would be the body and > > > contain > > > > > the "%{body}" escape sequence. > > > > > > > Jon. > > > > > > > On Tue, May 3, 2011 at 9:47 PM, Himanshu > wrote: > > > > > > Jonathan, > > > > > > > > Hi, thanks again for your earlier help. Now I am having problem > in > > > > > > hbaseSink usage. I am trying to write data from text source , > text > > > > > > file contains two columns named as 'row' and 'data'. > > > > > > And I described hbaseSink as hbase("test","%{row}","cf","qual",= "% > > > > > > {data}", writeBufferSize=3D"10", writeToWal=3D"true"). In hbase= I > have > > > > > > table named 'test' and columnFamily named 'cf'. When I submit > query > > > > > > flume generates log as > > > > > > WARN core.Event: Tag row not found > > > > > > WARN core.Event: Tag data not found > > > > > > > > I think that there is problem in my data format, or might be in > > > query. > > > > > > Suggest me where I am wrong. > > > > > > > > Thanks & Regards > > > > > > Himanshu > > > > > > > > On May 3, 8:03 pm, Jonathan Hsieh wrote: > > > > > > > Himanshu, > > > > > > > > > Currently the nodes and the master both must know about the > plugin > > > for it > > > > > > to > > > > > > > work (the master will reject a configuration attempt for sink= s > it > > > doesn't > > > > > > > know about.). My guess is that you've loaded the plugin on o= ne > but > > > not > > > > > > the > > > > > > > other. > > > > > > > > > You can check to see if plugins are loaded by checking these > web > > > pages > > > > > > and > > > > > > > seeing if the pluging sinks are listed : > > > > > > > > > http://:35862/extension.jsp > > > > > > > > > http://:35871/masterext.jsp > > > > > > > > > It would great if you could update documentation to make this > > > easier for > > > > > > > folks in the future. We are transitioning to maven builds an= d > some > > > of > > > > > > the > > > > > > > instructions may be out of date! > > > > > > > > > Jon. > > > > > > > > > > > > > > > > > > On Tue, May 3, 2011 at 7:50 AM, Himanshu > > > > wrote: > > > > > > > > Hi > > > > > > > > I have seen Flume-414, and trying to put data in hbase > through > > > flume > > > > > > > > using this hbaseSink. I am new to Flume, I performed > following > > > steps > > > > > > > > 1. I got your latest code from git and build by mvn package= . > > > > > > > > 2. I copied hbaseSink jar to lib of distribution package > > > generated. > > > > > > > > 3. I made add plugin class name to the flume-site.xml > > > > > > > > Now when i start flume master it loads plugin classes but > when i > > > try > > > > > > > > to query in master config it states invalid sink for > hbaseSink. > > > > > > > > Please point where I am wrong. Thanks in advance. > > > > > > > > > -- > > > > > > > // Jonathan Hsieh (shay) > > > > > > > // Software Engineer, Cloudera > > > > > > > // j...@cloudera.com > > > > > > > -- > > > > > // Jonathan Hsieh (shay) > > > > > // Software Engineer, Cloudera > > > > > // j...@cloudera.com > > > > -- > > // Jonathan Hsieh (shay) > > // Software Engineer, Cloudera > > // j...@cloudera.com > --=20 // Jonathan Hsieh (shay) // Software Engineer, Cloudera // jon@cloudera.com --bcaec5430a72fdba6904a892ee47 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
[Please subscribe to new= =C2=A0flume-user@incubator.apache.org=C2= =A0list, bcc=C2=A0flume-user@cloudera.org, cc=C2= =A0flume-user@incubator.apache.org]

Dennis,

These empty messages are added by the E2= E agent. =C2=A0When using the hbase sink, it should be used with the collec= tor { } wrapper. =C2=A0It handles acks delivery and should remove the empty= body messages.

collector(xxx) { hbase(yyy) }

= If you are doing this and the "AckType":"end" make it t= hrough, there may be a problem.

Jon.

On Mon, Jul 4, 2011 at 8:10 AM, Dennis <dennisoberhoff= @gmail.com> wrote:
Thanks for your help. Right now I've got a working Log4j -> Agent -&= gt;
Collector -> HBase Setup running. There are still some things to iron out, but it seems to work.

Something i've noticed:

The Log4jAppender just sends the MSG Body and Logpriority. No further
messages like time, date, loglevel and more which are defined in the
log4jconfig.
Also I've got many empty between my valid entries within hbase. Notice<= br> the last entry with an entry body. I suspect it is generated by the
AgentSink.

=C2=A0{"body":"1232 342","timestamp":13097899= 17614,"pri":"WARN","nanos":
1309789917614716000,"host":"10.0.0.30","fields&quo= t;:
{"AckTag":"20110704-163155044+0200.4234389357565.00000022&qu= ot;,"AckType":"msg","AckChecksum":"\u000= 0\u0000\u0000\u0000=C2=BFT=C3=B0=E2=80=9D"}}
=C2=A0{"body":"1232 343","timestamp":13097899= 19619,"pri":"WARN","nanos":
1309789919619332000,"host":"10.0.0.30","fields&quo= t;:
{"AckTag":"20110704-163155044+0200.4234389357565.00000022&qu= ot;,"AckType":"msg","AckChecksum":"\u000= 0\u0000\u0000\u0000=C3=88S=C3=80
\u0002"}}
=C2=A0{"body":"1232 344","timestamp":13097899= 21623,"pri":"WARN","nanos":
1309789921623857000,"host":"10.0.0.30","fields&quo= t;:
{"AckTag":"20110704-163155044+0200.4234389357565.00000022&qu= ot;,"AckType":"msg","AckChecksum":"\u000= 0\u0000\u0000\u0000V7U=C2=A1"}}
=C2=A0{"body":"1232 345","timestamp":13097899= 23628,"pri":"WARN","nanos":
1309789923628266000,"host":"10.0.0.30","fields&quo= t;:
{"AckTag":"20110704-163155044+0200.4234389357565.00000022&qu= ot;,"AckType":"msg","AckChecksum":"\u000= 0\u0000\u0000\u0000!
0e7"}}
=C2=A0{"body":"","timestamp":1309789925101,&q= uot;pri":"INFO","nanos":
4244446597812,"host":"ubuntu","fields":
{"AckTag":"20110704-163155044+0200.4234389357565.00000022&qu= ot;,"AckType":"end","AckChecksum":"\u000= 0\u0000\u00010=C3=93=C3=8D=C2=A6=E2=80=B9"}}

On Jun 28, 10:20=C2=A0pm, Jonathan Hsieh <j...@cloudera.com> wrote:
> Dennis, Himanshu,
>
> You'd need to pull in the data (lets say via text or tail sources)= , =C2=A0parse
> out parts of the line, and then feed it to a hbase sink.
>
> It would roughly look like this:
>
> node: tail("file") | =C2=A0regexAll("(\\w+)\\s+(\\w+)&q= uot;, "row", "data")
> collector(300000) { hbase("table", "%{row}", "= ;cf", "qual", "%{data}") }
>
> tail is the source.
>
> regexAll (an 0.9.4 feature) would pull out and add a "row" a= nd "data"
> attribute out of the event's body. if you are using 0.9.3, you'= ;d need to
> have two regex expression -- one to pull out the "row" and t= he "data".
>
> The hbase sink writes to table "table" using the extracted v= alue of "row" as
> the row key
>
> Finally, you'de wrap with a colllector constructor so that it can = to handle
> retries on failure (if hbase goes down) and acks (if agents potentiall= y go
> down).
>
> Jon.
>
>
>
>
>
>
>
>
>
> On Sun, Jun 26, 2011 at 12:23 = PM, Dennis <dennisoberh...@g= mail.com> wrote:
> > Have you figured out how this is to accomplished? I'm trying = to pipe
> > it through the RegexAll engine to write everything down into an > > separate column.
>
> > On May 4, 8:10 am, Himanshu <mishra1...@gmail.com> wrote:
> > > Hey Jonathan,
>
> > > Answer to your question:
> > > 1. I have just tried for a particular event for which it hap= pens every
> > > time, i have described my event below.
> > > 2. I am not using metadata as rowKey, I am trying to get par= t of body
> > > as rowKey and other part as value.
>
> > > I will just try to explain my case to you, I am using text f= ile as
> > > source,
> > > source defined as text("file.text") the file has t= wo columns there
> > > headers
> > > are 'row' and 'data'. like...
>
> > > row =C2=A0 =C2=A0 =C2=A0 data
> > > 123 =C2=A0 =C2=A0 =C2=A0 abc
> > > 234 =C2=A0 =C2=A0 =C2=A0 def
>
> > > and I described sink as
> > > hbase("test","%{row}","cf",&qu= ot;qual","%{data}", writeBufferSize=3D"10",
> > > writeToWal=3D"true")
>
> > > here I am trying to insert 'row' column from text fi= le as rowKey and
> > > 'data' column as value for
> > > columnFamily. Pls suggest it how can it be done.
> > > =C2=A0And one more thing if I want to use metadata data as r= owKey say 'part
> > > of date' as you stated
> > > earlier. How I can do this.
>
> > > Thanks & Regards
> > > Himanshu
>
> > > On May 4, 10:40 am, Jonathan Hsieh <j...@cloudera.com> wrote:
>
> > > > Hey Himanshu,
>
> > > > This version of hbase sink is pretty new so there might= be some rough
> > edges.
>
> > > > Some questions:
>
> > > > Does this happen for all events or just for some?
>
> > > > Do you have a metadata field called "row" and= another called "data" in
> > your
> > > > event? =C2=A0 For row, I would guess that you have some= sort of regex
> > extractor
> > > > adding "%{row}" metadata data, or using the &= quot;parts of date" extractors.
> > =C2=A0The
> > > > value/"%{data}" part I would think normally w= ould be the body and
> > contain
> > > > the "%{body}" escape sequence.
>
> > > > Jon.
>
> > > > On Tue, May 3, 2011 at 9:47 PM, Himanshu <mishra1...@gmail.com> wrote:
> > > > > Jonathan,
>
> > > > > Hi, thanks again for your earlier help. Now I am h= aving problem in
> > > > > hbaseSink usage. I am trying to write data from te= xt source , text
> > > > > file contains two columns named as 'row' a= nd 'data'.
> > > > > And I described hbaseSink as hbase("test"= ;,"%{row}","cf","qual","%
> > > > > {data}", writeBufferSize=3D"10", wr= iteToWal=3D"true"). In hbase I have
> > > > > table named 'test' and columnFamily named = 'cf'. When I submit query
> > > > > flume generates log as
> > > > > WARN core.Event: Tag row not found
> > > > > WARN core.Event: Tag data not found
>
> > > > > I think that there is problem in my data format, o= r might be in
> > query.
> > > > > Suggest me where I am wrong.
>
> > > > > Thanks & Regards
> > > > > Himanshu
>
> > > > > On May 3, 8:03 pm, Jonathan Hsieh <j...@cloudera.com> wrote:
> > > > > > Himanshu,
>
> > > > > > Currently the nodes and the master both must = know about the plugin
> > for it
> > > > > to
> > > > > > work (the master will reject a configuration = attempt for sinks it
> > doesn't
> > > > > > know about.). =C2=A0My guess is that you'= ve loaded the plugin on one but
> > not
> > > > > the
> > > > > > other.
>
> > > > > > You can check to see if plugins are loaded by= checking these web
> > pages
> > > > > and
> > > > > > seeing if the pluging sinks are listed :
>
> > > > > > http://<node>:35862/extension.jsp
>
> > > > > > http://<master>:35871/masterext.jsp
>
> > > > > > It would great if you could update documentat= ion to make this
> > easier for
> > > > > > folks in the future. =C2=A0We are transitioni= ng to maven builds and some
> > of
> > > > > the
> > > > > > instructions may be out of date!
>
> > > > > > Jon.
>
> > > > > > <http://localhost:35871/masterext.jsp>
>
> > > > > > On Tue, May 3, 2011 at 7:50 AM, Himanshu <= mishra1...@gmail.com>
> > wrote:
> > > > > > > Hi
> > > > > > > I have seen Flume-414, and trying to put= data in hbase through
> > flume
> > > > > > > using this hbaseSink. I am new to Flume,= I performed following
> > steps
> > > > > > > 1. I got your latest code from git and b= uild by mvn package.
> > > > > > > 2. I copied hbaseSink jar to lib of dist= ribution package
> > generated.
> > > > > > > 3. I made add plugin class name to the f= lume-site.xml
> > > > > > > =C2=A0Now when i start flume master it l= oads plugin classes but when i
> > try
> > > > > > > to query in master config it states inva= lid sink for hbaseSink.
> > > > > > > Please point where I am wrong. Thanks in= advance.
>
> > > > > > --
> > > > > > // Jonathan Hsieh (shay)
> > > > > > // Software Engineer, Cloudera
> > > > > > // j...@= cloudera.com
>
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // Software Engineer, Cloudera
> > > > // j...@cloudera.c= om
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // j...@cloudera.com<= /a>



--
// Jonathan Hsieh (shay= )
// Software Engineer, Cloudera

--bcaec5430a72fdba6904a892ee47--