flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Ryan <jur...@ziprealty.com>
Subject Re: Avro source: could not find schema for event
Date Tue, 08 Mar 2016 21:14:30 GMT
Thanks, Hari ­ I was looking for something like this.

I am still a bit confused, because when I write producer / consumer code in
Python, my consumer reads data out of kafka just fine and has no information
about the schema.

That said, I need to keep it available in HDFS for producers anyway, so this
will certainly do.

Is there any interaction between flume and schema registries?

From:  Hari Shreedharan <hshreedharan@cloudera.com>
Reply-To:  <user@flume.apache.org>
Date:  Tuesday, March 8, 2016 at 1:11 PM
To:  "user@flume.apache.org" <user@flume.apache.org>
Subject:  Re: Avro source: could not find schema for event

You can use a URL (on HDFS/HTTP), that points to the schema:
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/sr
c/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java#L70

Use that URL to store your schema for the event, so you don't have to add it
to the event itself.

Avro schema is only embedded in the files and not in event data, so we need
to make sure we write to the correct file based on the event's own schema.
avro_event works because we write the events out in a fixed schema (not the
event's schema itself).


Thanks,
Hari

On Tue, Mar 8, 2016 at 1:05 PM, Justin Ryan <juryan@ziprealty.com> wrote:
> Hiya folks, still struggling with this, is anyone on the list familiar with
> AvroEventSerializer$Builder ?
> 
> While I have gotten past my outright failure, I¹ve only done so by adopting a
> fairly inflexible schema, which seems counter to the goal of using avro.
> Particularly frustrating is that flume simply needs to pass the existing
> message along, though I understand it likely needs to grok to separate
> messages.  I can¹t even find Kafka consumer code which is capable of being
> schema-aware.
> 
> From:  Justin Ryan <juryan@ziprealty.com>
> Reply-To:  <user@flume.apache.org>
> Date:  Thursday, March 3, 2016 at 2:08 PM
> To:  <user@flume.apache.org>
> Subject:  Re: Avro source: could not find schema for event
> 
> Update:
> 
> So, I changed my serializer from
> org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and this
> started working.  Well, working-ish, the data is a little funky but it¹s
> arriving, being delivered to HDFS, and I can pull a file and examine it
> manually.
> 
> I seem to remember that I had the former based on some things I read about not
> having to specify a schema, since the schema is embedded in the avro data.
> 
> So I¹m confused, it seems that my previous configuration should have worked
> without any special attention to the schema, but I got complaints that the
> schema couldn¹t be found.
> 
> If anyone could shed a bit of light here, it would be much appreciated.
> 
> From:  Justin Ryan <juryan@ziprealty.com>
> Reply-To:  <user@flume.apache.org>
> Date:  Monday, February 29, 2016 at 2:52 PM
> To:  "user@flume.apache.org" <user@flume.apache.org>
> Subject:  Avro source: could not find schema for event
> 
> Hiya,
> 
> I¹ve got a fairly simply flume agent pulling events from kafka and landing
> them in HDFS.  For plain text messages, this works fine.
> 
> I created a topic specifically for the purpose of testing sending avro
> messages through kafka to land in HDFS, which I¹m having some trouble with.
> 
> I noted from 
> https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setup/
> the example of flume¹s default avro schema[0], which will do for my testing,
> and set up my python-avro producer to send messages with this schema.
> Unfortunately, I still have flume looping this message in its¹ log:
> 
>   org.apache.flume.FlumeException: Could not find schema for event
> 
> I¹m running out of assumptions to rethink / verify here, would appreciate any
> guidance on what I may be missing..
> 
> Thanks in advance,
> 
> Justin
> 
> [0] {
>  "type": "record",
>  "name": "Event",
>  "fields": [{
>    "name": "headers",
>    "type": {
>      "type": "map",
>      "values": "string"
>    }
>  }, {
>    "name": "body",
>    "type": "bytes"
>  }]
> }
> 




Mime
View raw message