incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <JEREMIAH.JOR...@morningstar.com>
Subject RE: Document storage
Date Thu, 29 Mar 2012 16:42:36 GMT
But it isn't special case logic.  The current AbstractType and Indexing of Abstract types for
the most part would already support this.  Someone just has to write the code for JSONType
or ProtoBuffType.

The problem isn't writing the code to break objects up, the problem is encode/decode time.
 Encode/decode to thrift is already a significant portion of the time line in writing data,
adding an object to column encode/decode on top of that makes it even longer.  For a read
heavy load that wants the JSON/Proto as the thing to be served to clients, an increase in
the write time line to parse/index the blob is probably acceptable, so that you don't have
to pay the re-assemble penalty every time you hit the database for that object.

But, once we get multi range slicing, for the average case I think the break it up into multiple
columns approach will be best for most people.  That is the other problem I have with doing
the break into columns thing right now.  I have to either use Super Columns and not be able
to index, so why did I break them up?  Or I can't get multiple objects at once, with out pulling
a huge slice from o1 start to o5 end and then throwing away the majority of the data I pulled
back that doesn't belong to o1 and o5

-Jeremiah

________________________________________
From: Jonathan Ellis [jbellis@gmail.com]
Sent: Thursday, March 29, 2012 11:23 AM
To: dev@cassandra.apache.org
Subject: Re: Document storage

On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan
<JEREMIAH.JORDAN@morningstar.com> wrote:
> Its not clear what 3647 actually is, there is no code attached, and no real example in
it.
>
> Aside from that, the reason this would be useful to me (if we could get indexing of attributes
working), is that I already have my data in JSON/Thrift/ProtoBuff, depending how large the
data is, it isn't trivial to break it up into columns to insert, and re-assemble into columns
to read.

I don't understand the problem.  Assuming Cassandra support for maps
and lists, I could write a Python module that takes json (or thrift,
or protobuf) objects and splits them into Cassandra rows by fields in
a couple hours.  I'm pretty sure this is essentially what Brian's REST
api for Cassandra does now.

I think this is a much better approach because that gives you the
ability to update or retrieve just parts of objects efficiently,
rather than making column values just blobs with a bunch of special
case logic to introspect them.  Which feels like a big step backwards
to me.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message