avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lan Jiang <ljia...@gmail.com>
Subject Re: Converting Protobuf object to Avro
Date Mon, 24 Aug 2015 23:59:04 GMT
Yes. I got it working using Protobufdata class. Just about writing to the mailing list. Thanks!

Sent from my iPhone

> On Aug 24, 2015, at 6:47 PM, William Briggs <wrbriggs@gmail.com> wrote:
> 
> Have you looked at the ProtoBuffData utility class? The getSchema method might do the
trick for you: http://avro.apache.org/docs/1.6.1/api/java/org/apache/avro/protobuf/ProtobufData.html#getSchema(java.lang.Class)
> 
> 
>> On Mon, Aug 24, 2015, 4:56 PM Lan Jiang <ljiang2@gmail.com> wrote:
>> Sean,
>> 
>> Thanks for the reply.
>> 
>> Your suggestion kind of makes sense. The default example wraps a GenericDatumWriter
with a DataFileWriter. Then call the create/append/close method on DataFileWriter in sequence
to write out the container file. 
>> 
>> Now my problem of using ProtobufDataWriter in a similar fashion is that I do not
have an avro schema object in the method call dataFileWriter.create(schema, file). As I understand,
the protobuf-avro should have a way to convert the protobuf schema to avro schema for you
automatically. I have not found any utility class to do the schema conversion.  Correct me
if I am wrong. 
>> 
>> Lan
>> 
>> 
>> 
>>> On Aug 24, 2015, at 3:14 PM, Sean Busbey <busbey@cloudera.com> wrote:
>>> 
>>> Hiya Lan!
>>> 
>>> You need to use a container file instead of just writing via the datum writer
yourself.
>>> 
>>> Take a look at the "Getting Started (Java)" section on serialization[1]. The
example there uses the GenericDatumWriter, but you ought to be able to switch it out for your
ProtobufDatumWriter.
>>> 
>>> 
>>> 
>>> 
>>> [1]: http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE
>>> 
>>>> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <ljiang2@gmail.com> wrote:
>>>> Hi, there
>>>> 
>>>> I am trying to convert a protobuf object to Avro. I am using 	
>>>> 
>>>> //myProto object is deserialized using google protobuf API
>>>> ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
>>>> FileOutputStream fo = new FileOutputStream(args[0]);
>>>> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
>>>> pbWriter.write(myProto, e);
>>>> fo.flush();
>>>> 
>>>> The avro file was created successfully. If I cat the file, I can see the
data in the file. However, when I tried to use avro-tools to get schema or meta info about
the saved avro file, it says
>>>> 
>>>> Exception in thread "main" java.io.IOException: Not a data file.
>>>> 	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>>>> 	at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>>>> 	at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>>>> 
>>>> Look at the Avro source code, the error means it does not have the first
4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong.

>>>> 
>>>> Appreciate any help you can give me.
>>>> 
>>>> Lan
>>> 
>>> 
>>> 
>>> -- 
>>> Sean

Mime
View raw message