avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <akimbal...@gmail.com>
Subject Re: Is Avro/Trevni strictly read-only?
Date Thu, 31 Jan 2013 02:32:40 GMT
Hi Russell,

Great question.  Kiji is more strongly typed than systems like MongoDB.
While your schema can evolve (using Avro evolution) without structurally
updating existing data, you still need to specify your Avro schemas in a
data dictionary. It's challenging to author systems in Java (as is typical
of HBase/HDFS/MapReduce-facing applications) without some strong typing in
the persistence layer. You wind up reading a lot of other peoples' code to
figure out what types were written -- assuming you can find the code (or
the hbase columns) in the first place.

You can create table schemas either "manually" by filling out a JSON /
Avro-based table layout specification, or you can use the DDL shell which
lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
table's set up, then you can write to it.  I think the DDL shell included
with the bento box makes this a reasonably low-overhead process.

We don't currently have any Pig integration. We've made some initial
proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
but it's not in a ready state yet. Someone (you? :) could write a Pig
integration; Pig already supports Avro I think. And you could even make it
analyze the first output tuple and use that to infer types/column names to
set up a result table with the appropriate table schema by invoking the DDL

Sorry I don't have a "magic wand" answer for you -- for the use cases we
target, these sorts of setup costs often pay off in the long run, so that's
the case we've optimized the design around. Let me know if there's anything
else I can help with.
- Aaron

On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <russell.jurney@gmail.com>wrote:

> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit
> of not specifying schemas with Voldemort and MongoDB, just storing a Pig
> relation and the schema is set in the store. If I can arrange that somehow,
> I'm all over Kiji. Panthera is a fork :/
> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <akimball83@gmail.com>wrote:
>> Hi ccleve,
>> I'd definitely urge you to try out Kiji -- we who work on it think it's a
>> pretty good fit for this specific use case. If you've got further questions
>> about Kiji and how to use it, please send them to me, or ask the kiji user
>> mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>> cheers,
>> - Aaron
>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cutting@apache.org> wrote:
>>> Avro and Trevni files do not support record update or delete.
>>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>>> to store Avro data in HBase.
>>> Doug
>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <ccleve.tech@gmail.com> wrote:
>>> > I've gone through the documentation, but haven't been able to get a
>>> definite
>>> > answer: is Avro, or specifically Trevni, only for read-only data?
>>> >
>>> > Is it possible to update or delete records?
>>> >
>>> > If records can be deleted, is there any code that will merge row sets
>>> to get
>>> > rid of the unused space?
>>> >
>>> >
>>> >
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com

View raw message