flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan" <hshreedha...@cloudera.com>
Subject Re: getting Avro into Flume
Date Thu, 18 Sep 2014 00:27:08 GMT
Yes, to Avro Source. RPC client sends it to Avro Source (unless you use Thrift source).


On Wed, Sep 17, 2014 at 5:26 PM, zzz <squiggly101@gmail.com> wrote:

> Thanks for the quick reply Hari.
> When you say send data to Flume using the RPC Client API, do you mean send
> it to the Avro Source? If not, which source? Because that is currently what
> I am trying to do. I wasn't sure if encoding Avro data as byte[] and
> sending it to the Avro Source was a valid approach, but from what you are
> saying there is a way for sources (at least the HDFS source) to recognize
> the encoded Avro data. I hope the Solr source can be made to be similarly
> aware.
> Would encoding the Avro data as byte[] and sending it to flume via the HTTP
> interface also work?
> I was actually having trouble converting an Avro object to a byte[] array
> to start with...but I will try that again.
> On Thu, Sep 18, 2014 at 10:16 AM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>> No, the Avro Source is an RPC source. To send data to Flume use the RPC
>> client API (https://flume.apache.org/FlumeDeveloperGuide.html#client).
>> Just encode your Avro data as byte[] and use the AVRO_EVENT serializer
>> while writing to HDFS.
>> Thanks,
>> Hari
>> On Wed, Sep 17, 2014 at 5:13 PM, zzz <squiggly101@gmail.com> wrote:
>>> I am using Cloudera CDH 5.1 and running a Flume agent configured by
>>> Cloudera manager.
>>> I would like to send Avro data to Flume, and I was assuming the Avro
>>> Source would be the appropriate method to send data in this way.
>>> However, the examples of Java clients that send data via the Avro Source,
>>> send simple strings, not Avro objects to be serialized, e.g. the example
>>> here: https://flume.apache.org/FlumeDeveloperGuide.html
>>> And the examples of Avro serialization all seem to be able serializing to
>>> disk.
>>> In my use case, I am basically receiving a real-time stream of JSON
>>> documents, which I am able to convert to Avro objects, and would like to
>>> put them into Flume. I would then like to be able to index this Avro data
>>> in Solr via the Solr sink, and convert it to Parquet format in HDFS using
>>> the HDFS sink.
>>> Is this possible or am I coming about this the wrong way?
View raw message