cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-265) Large object support
Date Thu, 02 Jul 2009 13:53:47 GMT


Jonathan Ellis commented on CASSANDRA-265:

Stu Hood points out that we'd want to store a hash of the file inline in the SSTable as part
of the lob pointer to make repair checks more efficient.

We'd also want to make sure that the key is part of the lob filename on the fs so that when
moving data to another node we don't have to do deep inspection of the sstable contents.

> Large object support
> --------------------
>                 Key: CASSANDRA-265
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
> The standard answer since forever has been "cassandra is a bad fit for large objects."
> But I think it doesn't have to be that way.  With a few simplifying assumptions we can
make this doable.
> First, screw Thrift.  There is no way to specify a stream of bytes cross-platform.  You
can't mix raw sockets into Thrift very easily (?) so screw it.  Make it an internal-only API
to start with, like the much-vaunted and much-feared BinaryVerbHandler.
> Second, forget about writing multiple lobs at once.  You insert one lob at a time, to
a specific column.
> With Thrift out of the equation we are not out of the woods.  MessagingService also assumes
that Messages will be memory resident and not streamed.  One approach to fix this would be
to have a StreamingMessage class that consists of a message id (that would be paired w/ origination
endpoint to make it unique) and a size.  The VerbHandler would keep a Map of incomplete StreamingMessages
around until the full size was read.  Then they could be disposed of.
> So a LargeObjectCommand would be basically just the command id and the payload, the streamed
lob.  And we would handle it by streaming it directly to a file.  When the stream was complete,
we would do a write to the standard commitlog/memtable with a pointer to that lob file.  That
would then be flushed normally to the sstable.  (This would require adding another boolean
to Column serialization, whether the value is really a lob pointer.  We could combine this
with the existing bool into a single byte and have room for a couple more flags, without taking
extra space.)
> So lobs would never appear directly in the commitlog, and we would never have to rewrite
them multiple times during compaction; just the pointers would get merged, but the lob files
themselves would not have to be touched.  (Except to remove them when a compaction shows that
an older version is no longer needed.)
> Then of course we'd need a corresponding ReadLargeObject command.  So the basics are
> Read Repair and Hinted Handoff would add a few more wrinkles but nothing fundamentally
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message