avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: State of the C++ vs Java implementations
Date Thu, 14 Aug 2014 18:56:48 GMT
Does the C++ implementation track the Java development closely?  I'm seeing discussion of a
new Decimal encoding in the mailing list, and it would be bad for us to commit to the C++
Avro, and then find that our customers have created Avro files (using Java, MapReduce, etc)
that we can't read.  We don't have control over what files we encounter, and it is desirable
for our product to read whatever a customer throws at it, within reason.

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, August 14, 2014 9:46 AM
To: user@avro.apache.org; Steve.Roehrs@rlmgroup.com.au
Subject: RE: State of the C++ vs Java implementations


Thanks so much for the reply.  I hope that I can inconvenience you for a little more guidance.
 We want to read and write Avro data files whose schema is not known until run-time, when
we read the file metadata and transform that into our own internal record structure.  So we
are not mapping to a C++ struct/class with defined compile-time members.  We just want to
loop over the records and columns in the data file, transforming them serially.  Can this
be done without incurring the performance penalty of GenericDatum that you speak of?

Different question: do you know if the full complement of compression codecs is available
in C++?  We don't need "everything possible", but we want to be able to read 99.9% of files
that we are likely to encounter in practice.


From: Steve Roehrs [mailto:Steve.Roehrs@rlmgroup.com.au]
Sent: Sunday, August 10, 2014 11:25 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: RE: State of the C++ vs Java implementations

Hi John

You can definitely read and write Avro data files using C++.  The DataFileWriter and DataFileReader
classes are what you need.

The README is severely out of date.

I can't comment on the relative performance of the Java/C++ API's - we used the C++ API for
our application, but for performance reasons we don't use the GenericDatum class, as it does
have poor performance for our particular mix of data.  I don't know if the Java API fares
any better in this regard.


Steve Roehrs
Senior Software Engineer | Lockheed Martin

| p: +61 8 7389 4525    | m: +61 4 3891 5622     | f: +61 8 7389 4551
| w: www.rlmgroup.com.au<http://www.rlmgroup.com.au> | e: Steve.Roehrs@rlmgroup.com.au<mailto:Steve.Roehrs@rlmgroup.com.au>
| Company address: 82-86 Woomera Ave, Edinburgh, SA 5111
This email and any attachment to it remains the property of Lockheed Martin and is intended
only to be read or used by the named addressee.  It may contain information that is confidential,
commercially valuable or subject to legal privilege.  If you receive this email in error,
please immediately delete it and notify the sender.  Opinions, conclusions and other information
in this message that do not relate to the official business of Lockheed Martin or any companies
within Lockheed Martin shall be understood as neither given nor endorsed by them.
From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Wednesday, August 06, 2014 6:28 AM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: State of the C++ vs Java implementations


I am desiring to read and write Avro files (such as those manipulated by MapReduce applications)
from a C++ program.  While there are higher-level wrappers (such as Hive), I am interested
in reading/writing the files directly.  There are both C++ and Java library implementations;
however, in the C++ API README I see "And the file and rpc containers are not yet implemented."
 Does this mean that I can't read and write Avro files using the C++ library?

We have very good C++/JNI wrapper-generator, so using the Java is not terribly difficult.
 Given that, which interface would you recommend?  Does the C++ interface (assuming it works)
have significant performance advantages?


View raw message