Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E78BF242 for ; Fri, 4 Oct 2013 21:44:34 +0000 (UTC) Received: (qmail 24432 invoked by uid 500); 4 Oct 2013 21:44:31 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 24278 invoked by uid 500); 4 Oct 2013 21:44:22 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 24269 invoked by uid 99); 4 Oct 2013 21:44:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Oct 2013 21:44:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of deepak.subhramanian@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Oct 2013 21:44:16 +0000 Received: by mail-qc0-f176.google.com with SMTP id t7so3208964qcv.7 for ; Fri, 04 Oct 2013 14:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=VSdomEvlrNxVPwCKTca9o5RK4GojbsOz6WYCHHEAx1o=; b=Mz5Rrem0g+M+V0hrD5w6hPrYt1ScpY5xeIwW7sdFtKkWCOJuQ61vmkiqyQyBdHEtCW 0PTgX+ruqAyKvOpqdyXbYMhGQweYbemOInWTqv6l9k38GH+RdXF0fVu+D23npMfS6dLm +gFzGJvp4QQ0NikBDlwsnpsi5kqmgYi03YKOUPsLL1DKZaw4CHwmaSHW8UninEsIAarG DVpqB1EetSwuPPo96Rjz06Xd9T+BNnQr/EHveZBSoNf2BDQ4n6ePCO2vTNP9tmZuXUU1 TLOJ121ld4lv9LEjqPojzpMEySXwkI9ZT72mBy6BFl+2v0OC1jkl6NW7JWI0Xuvv1dRE 39gQ== MIME-Version: 1.0 X-Received: by 10.224.28.201 with SMTP id n9mr6388245qac.95.1380923035717; Fri, 04 Oct 2013 14:43:55 -0700 (PDT) Received: by 10.49.134.74 with HTTP; Fri, 4 Oct 2013 14:43:55 -0700 (PDT) In-Reply-To: References: Date: Fri, 4 Oct 2013 22:43:55 +0100 Message-ID: Subject: Re: Converting text to avro in Flume From: Deepak Subhramanian To: user@flume.apache.org Content-Type: multipart/alternative; boundary=001a11c35ab001974b04e7f133e5 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c35ab001974b04e7f133e5 Content-Type: text/plain; charset=ISO-8859-1 Thanks Hari. I speficied the fileType. This is what I have. I will try again and let you know. tier1.sources = httpsrc1 tier1.channels = c1 tier1.sinks = sink1 tier1.sources.httpsrc1.bind = 127.0.0.1 tier1.sources.httpsrc1.type = http tier1.sources.httpsrc1.port = 9999 tier1.sources.httpsrc1.channels = c1 tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler tier1.sources.httpsrc1.handler.nickname = HTTPTesting tier1.channels.c1.type = memory tier1.channels.c1.capacity = 100 #tier1.sinks.sink1.type = logger tier1.sinks.sink1.channel = c1 tier1.sinks.sink1.type = hdfs tier1.sinks.sink1.hdfs.path = /tmp/flumecollector tier1.sinks.sink1.hdfs.filePrefix = access_log tier1.sinks.sink1.hdfs.fileSuffix = .avro tier1.sinks.sink1.hdfs.fileType = DataStream tier1.sinks.sink1.hdfs.serializer = avro_event I also added this later. tier1.sinks.sink1.hdfs.serializer.appendNewline = true tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan wrote: > The default data type for HDFS Sink is Sequence file. Set the > hdfs.fileType to DataStream. See details here: > http://flume.apache.org/FlumeUserGuide.html#hdfs-sink > > > Thanks, > Hari > > On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote: > > I tried using the HDFS Sink to generate the avro file by using the > serializer as avro_event. But it is not generating avro file. But a > sequence file. Is it not suppose to generate a avro file with default > schema ? Or do I have to generate the avro data from text in my > HTTPHandler source ? > > "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" + > > " {\"name\": \"headers\", \"type\": { \"type\": \"map\", > \"values\": \"string\" } }, " + > " {\"name\": \"body\", \"type\": \"bytes\" } ] }"); > > > On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian < > deepak.subhramanian@gmail.com> wrote: > > Hi , > > I want to convert xml files in text to an avro file and store it in hdfs . > I get the xml files as a post request. I extended the HTTPHandler to > process the XML post request. Do I have to convert the data in text to avro > in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to > avro with some configuration details. I want to store the entire xml string > in an avro variable. > > Thanks in advance for any inputs. > Deepak Subhramanian > > > > > -- > Deepak Subhramanian > > > -- Deepak Subhramanian --001a11c35ab001974b04e7f133e5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Hari.=A0

I speficied the fileTyp= e. =A0This is what I have. I will try again and let you know.=A0
=
tier1.sources =A0=3D httpsrc1
tier1.channels = =3D c1 =A0
tier1.sinks =A0 =A0=3D sink1 =A0
=A0
tier1.sources= .httpsrc1.bind =A0 =A0 =3D 127.0.0.1
tier1.sources.httpsrc1.type = =3D http
tier1.sources.httpsrc1.port =3D 9999
tier1.sou= rces.httpsrc1.channels =3D c1
tier1.sources.httpsrc1.handler =3D spikes.flume.XMLHandler
t= ier1.sources.httpsrc1.handler.nickname =3D HTTPTesting

=
tier1.channels.c1.type =A0 =3D memory
tier1.channels.c1.capa= city =3D 100
#tier1.sinks.sink1.type =A0 =A0 =A0 =A0 =3D logger
tier1.sin= ks.sink1.channel =A0 =A0 =A0=3D c1

=A0
= =A0tier1.sinks.sink1.type =3D hdfs=A0
=A0
tier1.sinks.s= ink1.hdfs.path =3D /tmp/flumecollector=A0
tier1.sinks.sink1.hdfs.filePrefix =3D access_log=A0
tier1.si= nks.sink1.hdfs.fileSuffix =3D .avro
tier1.sinks.sink1.hdfs.fileTy= pe =3D DataStream
tier1.sinks.sink1.hdfs.serializer =3D =A0avro_e= vent

I also added this later.=A0
tier1.sinks.sink1= .hdfs.serializer.appendNewline =3D true
tier1.sinks.sink1.hdfs.se= rializer.compressionCodec =3D snappy
=A0


On Fri, Oct 4, 2013 at 4:56 PM, Hari Shr= eedharan <hshreedharan@cloudera.com> wrote:
The default data type for HDFS Sink is Sequence file. S= et the hdfs.fileType to DataStream. See details here:=A0http://flu= me.apache.org/FlumeUserGuide.html#hdfs-sink


Thanks,
Hari

=20

On Friday, October 4, 2013 at 6:= 52 AM, Deepak Subhramanian wrote:

I tried using the HDFS= Sink to generate the avro file by using the serializer as avro_event. But = it is not generating avro file. But a sequence file. Is it not suppose to g= enerate a avro file with default schema ? =A0Or do I have to generate the a= vro data from text in my HTTPHandler source ?=A0

=A0"{ \"type\":\"record\", \"= name\": \"Event\", \"fields\": [" +

=A0 =A0 =A0 " {\"name\": \"headers\&quo= t;, \"type\": { \"type\": \"map\", \"val= ues\": \"string\" } }, " +

=A0 =A0 =A0=A0" {\"name\": \"body\&qu= ot;, \"type\": \"bytes\" } ] }");=A0= =A0


On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <<= a href=3D"mailto:deepak.subhramanian@gmail.com" target=3D"_blank">deepak.su= bhramanian@gmail.com> wrote:
Hi ,

I want to convert xml files in tex= t to an avro file and store it in hdfs . I get the xml files as a post requ= est. I extended the =A0HTTPHandler to process the XML post request. Do I ha= ve to convert the data in text to avro in HTTPHandler or does the Avro Sink= or HDFSSink convert it directly to avro with some configuration details. I= want to store the entire xml string in an avro variable.=A0

Thanks in advance for any inputs.=A0
Deepak Subhramanian



--
Deepak= Subhramanian
=20 =20 =20 =20 =20




<= /div>--
Deepak Subhramanian
--001a11c35ab001974b04e7f133e5--