Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66BF9DE42 for ; Thu, 8 Nov 2012 21:35:15 +0000 (UTC) Received: (qmail 36649 invoked by uid 500); 8 Nov 2012 21:35:15 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 36598 invoked by uid 500); 8 Nov 2012 21:35:15 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 36590 invoked by uid 99); 8 Nov 2012 21:35:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 21:35:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [91.238.219.8] (HELO mail2.intux.be) (91.238.219.8) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 21:35:06 +0000 Received: from webmail.intux.be (web.intux [10.254.128.29]) by mail2.intux.be (Postfix) with ESMTPSA id 426E4133F for ; Thu, 8 Nov 2012 22:34:46 +0100 (CET) Received: from dD5769628.access.telenet.be ([213.118.150.40]) by webmail.intux.be with HTTP (HTTP/1.1 POST); Thu, 08 Nov 2012 22:34:46 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_fa5fd16888369a6695acb3b76cb48ed3" Date: Thu, 08 Nov 2012 22:34:46 +0100 From: Bart Verwilst To: Subject: Re: Using Python and Flume to store avro data In-Reply-To: <70DD6B9F8DA444FBB8CAE4FC475D657D@cloudera.com> References: <5d22a2a897502845d13c7467a23b3485@verwilst.be> <8D8957F1BEF24B338011C7F5BEC913D2@cloudera.com> <0764bbb4282c416bfdf390a8fdec8cce@verwilst.be> <70DD6B9F8DA444FBB8CAE4FC475D657D@cloudera.com> Message-ID: X-Sender: lists@verwilst.be User-Agent: Intux Webmail/0.8.2 X-Virus-Checked: Checked by ClamAV on apache.org --=_fa5fd16888369a6695acb3b76cb48ed3 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=UTF-8 Would the sink serializer from https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html ( avro_event ) by the right tool for the job? Probably not since i won't be able to send the exact avro schema over the http/json link, and it will need conversion first. I'm not a Java programmer though, so i think writing my own serializer would be stretching it a bit. :( Maybe i can use hadoop streaming to import my avro or something... :( Kind regards, Bart Hari Shreedharan schreef op 08.11.2012 22:12: > Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out. > > Thanks, > Hari > > -- > Hari Shreedharan > > On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote: > >> Hi Hari, >> >> Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) >> >> Kind regards, >> >> Bart >> >> Hari Shreedharan schreef op 08.11.2012 20:06: >> >>> No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 [2] >>> >>> This will be in the next release which will be out soon. >>> >>> Thanks, >>> Hari >>> >>> -- >>> Hari Shreedharan >>> >>> On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: >>> >>>> Hi Hari, >>>> >>>> Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) >>>> >>>> I assume the http/json source will also allow for avro to be received? >>>> >>>> Kind regards, >>>> >>>> Bart >>>> >>>> Hari Shreedharan schreef op 08.11.2012 19:51: >>>> >>>>> The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>>> -- >>>>> Hari Shreedharan >>>>> >>>>> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've been spending quite a few hours trying to push avro data to Flume >>>>>> so i can store it on HDFS, this all with Python. >>>>>> It seems like something that is impossible for now, since the only way >>>>>> to push avro data to Flume is by the use of deprecated thrift binding >>>>>> that look pretty cumbersome to get working. >>>>>> I would like to know what's the best way to import avro data into Flume >>>>>> with Python? Maybe Flume isnt the right tool and I should use something >>>>>> else? My goal is to have multiple python workers pushing data to HDFS >>>>>> which ( by means of Flume in this case ) consolidates this all in 1 file >>>>>> there. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Bart Links: ------ [1] http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [2] https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 --=_fa5fd16888369a6695acb3b76cb48ed3 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

Would the sink serializer from https://cwiki.apache.org/FLUME/flume-1x-e= vent-serializers.html ( avro_event ) by the right tool for the job? Probabl= y not since i won't be able to send the exact avro schema over the http/jso= n link, and it will need conversion first. I'm not a Java programmer though= , so i think writing my own serializer would be stretching it a bit. :(

 

Maybe i can use hadoop streaming to import my avro or something... :(

Kind regards,

Bart

 

Hari Shreedharan schreef op 08.11.2012 22:12:

Writing to avro files depends on how = you serialize your data on the sink side, using a serializer. Note that JSO= N supports only UTF-8/16/32 encoding, so if you want to send binary data yo= u will need to write your own handler for that (you can use the JSON handle= r as an example) and configure the source to use that handler. Once the dat= a is in Flume, just plug in your own serializer (which can take the byte ar= ray from the event and convert it into the schema you want) and write it ou= t.
 
 
Thanks,
Hari
 
-- 
Hari Shreedharan
 

On Thursday, November 8, 2012 at 1:02 PM, Bart= Verwilst wrote:

Hi Hari,

 

Just to be absolutely sure, you can write to avro files by using this? I= f so, I will try out a snapshot of 1.3 tomorrow and start playing with it= =2E ;)

 

Kind regards,

 

Bart

 

 

Hari Shreedharan schreef op 08.11.2012 20:06:

No, I am talking about: <= a href=3D"https://git-wip-us.apache.org/repos/asf?p=3Dflume.git;a=3Dcommit;= h=3Dbc1928bc2e23293cb20f4bc2693a3bc262f507b3">https://git-wip-us.apache.org= /repos/asf?p=3Dflume.git;a=3Dcommit;h=3Dbc1928bc2e23293cb20f4bc2693a3bc262f= 507b3
 
This will be in the next release whic= h will be out soon.
 
 
Thanks,
Hari
 
-- 
Hari Shreedharan
 

On Thursday, November 8, 2012 at 10:57 AM, Bar= t Verwilst wrote:

Hi Hari,


Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/sea= rch/avro.ipc.HTTPTransceiver )? This was the class I tried before i not= iced it wasn't supported by Flume-1.2 :)

I assume the http/json source will also allow for avro to be received?

 

Kind regards,

Bart

 

Hari Shreedharan schreef op 08.11.2012 19:51:

The next release of Flume-1.3.0 adds = support for an HTTP source, which will allow you to send data to Flume via = HTTP/JSON(the representation of the data is pluggable - but a JSON represen= tation is default). You could use this to write data to Flume from Python, = which I believe has good http and json support.
 
 
Thanks,
Hari
 
-- 
Hari Shreedharan
 

On Thursday, November 8, 2012 at 10:45 AM, Bar= t Verwilst wrote:

Hi,
 
I've been spending quite a few hours trying to push avro data to Flume=
so i can store it on HDFS, this all with Python.
It seems like something that is impossible for now, since the only way=
to push avro data to Flume is by the use of deprecated thrift binding<= /div>
that look pretty cumbersome to get working.
I would like to know what's the best way to import avro data into Flum= e
with Python? Maybe Flume isnt the right tool and I should use somethin= g
else? My goal is to have multiple python workers pushing data to HDFS<= /div>
which ( by means of Flume in this case ) consolidates this all in 1 fi= le
there.
 
Any thoughts?
 
Thanks!
 
Bart
 
 
 
--=_fa5fd16888369a6695acb3b76cb48ed3--