incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject Re: Is SuperColumn necessary?
Date Mon, 10 May 2010 16:44:56 GMT
I think that it is 100% ideal: it's what I've been working on implementing in #674, #847 and
#998. I'm hoping to post a large patchset and docs this week, and I'm aiming to get it committed
for 0.8.

The work I've been doing doesn't touch the user interface: it only deals with the internal
changes necessary to make this type of storage possible.


-----Original Message-----
From: "Mike Malone" <mike@simplegeo.com>
Sent: Monday, May 10, 2010 11:37am
To: user@cassandra.apache.org
Subject: Re: Is SuperColumn necessary?

Maybe... but honestly, it doesn't affect the architecture or interface at
all. I'm more interested in thinking about how the system should work than
what things are called. Naming things are important, but that can happen
later.

Does anyone have any thoughts or comments on the architecture I suggested
earlier?

Mike

On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zsongbo@gmail.com> wrote:

> Yes, the "column" here is not appropriate.
> Maybe we need not to create new terms, in Google's Bigtable, the term
> "qualifier" is a good one.
>
>
> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <david@lookin2.com> wrote:
>
>> That would be a good time to get rid of the confusing "column" term, which
>> incorrectly suggests a two-dimensional tabular structure.
>>
>> Suggestions:
>>
>> 1. A hypercube (or hypocube, if only two dimensions): replace "key" and
>> "column" with "1st dimension", "2nd dimension", etc.
>>
>> 2. A file system: replace "key" and "column" with "directory" and
>> "subdirectory"
>>
>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose value
>> is the set of keys, whose value is the set of supercolumns of the key, whose
>> value is the set of columns for the supercolumn, etc.
>>
>> 4. Etc.
>>
>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <mike@simplegeo.com> wrote:
>>
>>> Nice, Ed, we're doing something very similar but less generic.
>>>
>>> Now replace all of the various methods for querying with a simple query
>>> interface that takes a Predicate, allow the user to specify (in
>>> storage-conf) which levels of the nested Columns should be indexed, and
>>> completely remove Comparators and have people subclass Column / implement
>>> IColumn and we'd really be on to something ;).
>>>
>>> Mock storage-conf.xml:
>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>>> ClusterPartitioned="True" Type="UTF8">
>>>     <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True"
>>> Type="UTF8">
>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>>> Type="ASCII">
>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>>>         </Column>
>>>       </Column>
>>>     </Column>
>>>   </Column>
>>>
>>> Thrift:
>>>   struct NamePredicate {
>>>     1: required list<binary> column_names,
>>>   }
>>>   struct SlicePredicate {
>>>     1: required binary start,
>>>     2: required binary end,
>>>   }
>>>   struct CountPredicate {
>>>     1: required struct predicate,
>>>     2: required i32 count=100,
>>>   }
>>>   struct AndPredicate {
>>>     1: required Predicate left,
>>>     2: required Predicate right,
>>>   }
>>>   struct SubColumnsPredicate {
>>>     1: required Predicate columns,
>>>     2: required Predicate subcolumns,
>>>   }
>>>   ... OrPredicate, OtherUsefulPredicates ...
>>>   query(predicate, count, consistency_level) # Count here would be total
>>> count of leaf values returned, whereas CountPredicate specifies a column
>>> count for a particular sub-slice.
>>>
>>> Not fully baked... but I think this could really simplify stuff and make
>>> it more flexible. Downside is it may give people enough rope to hang
>>> themselves, but at least the predicate stuff is easily distributable.
>>>
>>> I'm thinking I'll play around with implementing some of this stuff myself
>>> if I have any free time in the near future.
>>>
>>> Mike
>>>
>>>
>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <jbellis@gmail.com>wrote:
>>>
>>>> Very interesting, thanks!
>>>>
>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <ed@anuff.com> wrote:
>>>> > Follow-up from last weeks discussion, I've been playing around with
a
>>>> simple
>>>> > column comparator for composite column names that I put up on github.
>>>> I'd
>>>> > be interested to hear what people think of this approach.
>>>> >
>>>> > http://github.com/edanuff/CassandraCompositeType
>>>> >
>>>> > Ed
>>>> >
>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <ed@anuff.com> wrote:
>>>> >>
>>>> >> It might make sense to create a CompositeType subclass of
>>>> AbstractType for
>>>> >> the purpose of constructing and comparing these types of "composite"
>>>> column
>>>> >> names so that if you could more easily do that sort of thing rather
>>>> than
>>>> >> having to concatenate into one big string.
>>>> >>
>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone <mike@simplegeo.com>
>>>> wrote:
>>>> >>>
>>>> >>> The only thing SuperColumns appear to buy you (as someone pointed
>>>> out to
>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano)
is that
>>>> you can
>>>> >>> use different comparator types for the Super/SubColumns, I guess..?
>>>> But you
>>>> >>> should be able to do the same thing by creating your own Column
>>>> comparator.
>>>> >>> I guess my point is that SuperColumns are mostly a convenience
>>>> mechanism, as
>>>> >>> far as I can tell.
>>>> >>> Mike
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> http://riptano.com
>>>>
>>>
>>>
>>
>



Mime
View raw message