flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Avro source: could not find schema for event
Date Tue, 08 Mar 2016 21:11:12 GMT
You can use a URL (on HDFS/HTTP), that points to the schema:
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java#L70

Use that URL to store your schema for the event, so you don't have to add
it to the event itself.

Avro schema is only embedded in the files and not in event data, so we need
to make sure we write to the correct file based on the event's own schema.
avro_event works because we write the events out in a fixed schema (not the
event's schema itself).


Thanks,
Hari

On Tue, Mar 8, 2016 at 1:05 PM, Justin Ryan <juryan@ziprealty.com> wrote:

> Hiya folks, still struggling with this, is anyone on the list familiar
> with AvroEventSerializer$Builder ?
>
> While I have gotten past my outright failure, I’ve only done so by
> adopting a fairly inflexible schema, which seems counter to the goal of
> using avro.  Particularly frustrating is that flume simply needs to pass
> the existing message along, though I understand it likely needs to grok to
> separate messages.  I can’t even find Kafka consumer code which is capable
> of being schema-aware.
>
> From: Justin Ryan <juryan@ziprealty.com>
> Reply-To: <user@flume.apache.org>
> Date: Thursday, March 3, 2016 at 2:08 PM
> To: <user@flume.apache.org>
> Subject: Re: Avro source: could not find schema for event
>
> Update:
>
> So, I changed my serializer from
> org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and
> this started working.  Well, working-ish, the data is a little funky but
> it’s arriving, being delivered to HDFS, and I can pull a file and examine
> it manually.
>
> I seem to remember that I had the former based on some things I read about
> not having to specify a schema, since the schema is embedded in the avro
> data.
>
> So I’m confused, it seems that my previous configuration should have
> worked without any special attention to the schema, but I got complaints
> that the schema couldn’t be found.
>
> If anyone could shed a bit of light here, it would be much appreciated.
>
> From: Justin Ryan <juryan@ziprealty.com>
> Reply-To: <user@flume.apache.org>
> Date: Monday, February 29, 2016 at 2:52 PM
> To: "user@flume.apache.org" <user@flume.apache.org>
> Subject: Avro source: could not find schema for event
>
> Hiya,
>
> I’ve got a fairly simply flume agent pulling events from kafka and landing
> them in HDFS.  For plain text messages, this works fine.
>
> I created a topic specifically for the purpose of testing sending avro
> messages through kafka to land in HDFS, which I’m having some trouble with.
>
> I noted from
> https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setup/
> the example of flume’s default avro schema[0], which will do for my
> testing, and set up my python-avro producer to send messages with this
> schema.  Unfortunately, I still have flume looping this message in its’ log:
>
>   org.apache.flume.FlumeException: Could not find schema for event
>
> I’m running out of assumptions to rethink / verify here, would appreciate
> any guidance on what I may be missing..
>
> Thanks in advance,
>
> Justin
>
> [0] {
>  "type": "record",
>  "name": "Event",
>  "fields": [{
>    "name": "headers",
>    "type": {
>      "type": "map",
>      "values": "string"
>    }
>  }, {
>    "name": "body",
>    "type": "bytes"
>  }]
> }
>
>

Mime
View raw message