incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Malone <m...@simplegeo.com>
Subject Re: Is SuperColumn necessary?
Date Tue, 11 May 2010 02:22:56 GMT
>
> Mike just suggested to concate comment id with each of the comment field
> names so that the above data can be stored in normal column family. It looks
> fine except that I'm not sure the time sorting on comments still works or
> not.
>

In the case of time you can just use lexicographically sortable strings that
represent your timestamp (e.g., RFC 3339). You're right, I don't think
TimeUUID does that. For more complicated things (e.g., TimeUUIDs or packed
numerics that you don't want to zero pad) you'd have to implement a custom
comparator. So the "convenience" mechanisms that would have to be
implemented (and, in fact, Stu and Ed have pretty much already implemented)
would take care of concatenating the column names and doing the chained
comparisons for you.

Mike


>
>
> On Mon, May 10, 2010 at 5:36 PM, William Ashley <washley@gmail.com> wrote:
>
>> I'm having a difficult time understanding your syntax. Could you provide
>> an example with actual data?
>>
>> On May 10, 2010, at 5:25 PM, AJ Chen wrote:
>>
>> your suggestion works for fixed supercolumn name. the blog example now
>> becomes:
>> { blog-id {name, title, ...}
>>   blog-id-comments {time:commenter}
>> }
>>
>> what about supercolumn names that are not fixed? for example, I want to
>> store comment's details with the blog like this:
>> { blog-id { blog { name, title, ...}
>>               comments {comment-id:commenter}
>>               comment-id {commenter, time, text, ...}
>> }
>>
>> a comment-id is generated on-the-fly when the comment is made.  how do you
>> flatten the comment-id supercolumn to normal column?  just for brain
>> exercise, not meant to pick on you.
>>
>> thanks,
>> -aj
>>
>>
>>
>> On Mon, May 10, 2010 at 4:39 PM, William Ashley <washley@gmail.com>wrote:
>>
>>> If you're storing your super column under a fixed name, you could just
>>> concatenate that name with the row key and use normal columns. Then you get
>>> your paging and sorting the way you want it.
>>>
>>>
>>> On May 10, 2010, at 4:31 PM, AJ Chen wrote:
>>>
>>> supercolumn is good for modeling profile type of data. simple example is
>>> blog:
>>> blog { blog {author,  title, ...}
>>>          comments   {time: commenter}  //sort by TimeUUID
>>> }
>>> when retrieving a blog, you get all the comments sorted by time already.
>>> without supercolumn, you would need to concatenate multiple comment times
>>> together as you suggested.
>>>
>>> requiring user to concatenating data fields together is not only an extra
>>> burden on user but also a less clean design.  there will be cases where the
>>> list property of a profile data is a long list (say a million items). in
>>> such cases, user wants to be able to directly insert/delete an item in that
>>> list because it's more efficient.  Retrieving the whole list, updating it,
>>> concatenating again, and then putting it back to datastore is awkward and
>>> less efficient.
>>>
>>> -aj
>>>
>>>
>>> On Mon, May 10, 2010 at 2:20 PM, Mike Malone <mike@simplegeo.com> wrote:
>>>
>>>> On Mon, May 10, 2010 at 1:38 PM, AJ Chen <ajchen@web2express.org>wrote:
>>>>
>>>>> Could someone confirm this discussion is not about abandoning
>>>>> supercolumn family? I have found modeling data with supercolumn family
is
>>>>> actually an advantage of cassadra compared to relational database. Hope
you
>>>>> are going to drop this important concept.  How it's implemented internally
>>>>> is a different matter.
>>>>>
>>>>
>>>> SuperColumns are useful as a convenience mechanism. That's pretty much
>>>> it. There's _nothing_ (as far as I can tell) that you can do with
>>>> SuperColumns that you can't do by manually concatenating key names with a
>>>> separator on the client side and implementing a custom comparator on the
>>>> server (as ugly as that is).
>>>>
>>>> This discussion is about getting rid of SuperColumns and adding a more
>>>> generic mechanism that will actually be useful and interesting and will
>>>> continue to be convenient for the types of use cases for which people use
>>>> SuperColumns.
>>>>
>>>> If there's a particular use case that you feel you can only implement
>>>> with SuperColumns, please share! I honestly can't think of any.
>>>>
>>>> Mike
>>>>
>>>>
>>>>> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jshook@gmail.com>wrote:
>>>>>
>>>>>> Agreed
>>>>>>
>>>>>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <mike@simplegeo.com>
>>>>>> wrote:
>>>>>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jshook@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> I have to disagree about the naming of things. The name
of
>>>>>> something
>>>>>> >> isn't just a literal identifier. It affects the way people
think
>>>>>> about
>>>>>> >> it. For new users, the whole naming thing has been a persistent
>>>>>> >> barrier.
>>>>>> >
>>>>>> > I'm saying we shouldn't be worried too much about coming up
with
>>>>>> names and
>>>>>> > analogies until we've decided what it is we're naming.
>>>>>> >
>>>>>> >>
>>>>>> >> As for your suggestions, I'm all for simplifying or generalizing
>>>>>> the
>>>>>> >> "how it works" part down to a more generalized set of operations.
>>>>>> I'm
>>>>>> >> not sure it's a good idea to require users to think in terms
>>>>>> building
>>>>>> >> up a fluffy query structure just to thread it through a
needle of
>>>>>> an
>>>>>> >> API, even for the simplest of queries. At some point, the
level of
>>>>>> >> generic boilerplate takes away from the semantic hand rails
that
>>>>>> >> developers like. So I guess I'm suggesting that "how it
works" and
>>>>>> >> "how we use it" are not always exactly the same. At least
they
>>>>>> should
>>>>>> >> both hinge on a common conceptual model, which is where
the naming
>>>>>> >> becomes an important anchoring point.
>>>>>> >
>>>>>> > If things are done properly, client libraries could expose
>>>>>> simplified query
>>>>>> > interfaces without much effort. Most ORMs these days work by
>>>>>> building a
>>>>>> > propositional directed acyclic graph that's serialized to SQL.
This
>>>>>> would
>>>>>> > work the same way, but it wouldn't be converted into a 4GL.
>>>>>> > Mike
>>>>>> >
>>>>>> >>
>>>>>> >> Jonathan
>>>>>> >>
>>>>>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <mike@simplegeo.com>
>>>>>> wrote:
>>>>>> >> > Maybe... but honestly, it doesn't affect the architecture
or
>>>>>> interface
>>>>>> >> > at
>>>>>> >> > all. I'm more interested in thinking about how the
system should
>>>>>> work
>>>>>> >> > than
>>>>>> >> > what things are called. Naming things are important,
but that can
>>>>>> happen
>>>>>> >> > later.
>>>>>> >> > Does anyone have any thoughts or comments on the architecture
I
>>>>>> >> > suggested
>>>>>> >> > earlier?
>>>>>> >> >
>>>>>> >> > Mike
>>>>>> >> >
>>>>>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <
>>>>>> zsongbo@gmail.com>
>>>>>> >> > wrote:
>>>>>> >> >>
>>>>>> >> >> Yes, the "column" here is not appropriate.
>>>>>> >> >> Maybe we need not to create new terms, in Google's
Bigtable, the
>>>>>> term
>>>>>> >> >> "qualifier" is a good one.
>>>>>> >> >>
>>>>>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn
<
>>>>>> david@lookin2.com>
>>>>>> >> >> wrote:
>>>>>> >> >>>
>>>>>> >> >>> That would be a good time to get rid of the
confusing "column"
>>>>>> term,
>>>>>> >> >>> which incorrectly suggests a two-dimensional
tabular structure.
>>>>>> >> >>>
>>>>>> >> >>> Suggestions:
>>>>>> >> >>>
>>>>>> >> >>> 1. A hypercube (or hypocube, if only two dimensions):
replace
>>>>>> "key"
>>>>>> >> >>> and
>>>>>> >> >>> "column" with "1st dimension", "2nd dimension",
etc.
>>>>>> >> >>>
>>>>>> >> >>> 2. A file system: replace "key" and "column"
with "directory"
>>>>>> and
>>>>>> >> >>> "subdirectory"
>>>>>> >> >>>
>>>>>> >> >>> 3. A tuple tree: "Column family" replaced by
top-level tuple,
>>>>>> whose
>>>>>> >> >>> value
>>>>>> >> >>> is the set of keys, whose value is the set
of supercolumns of
>>>>>> the key,
>>>>>> >> >>> whose
>>>>>> >> >>> value is the set of columns for the supercolumn,
etc.
>>>>>> >> >>>
>>>>>> >> >>> 4. Etc.
>>>>>> >> >>>
>>>>>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone
<
>>>>>> mike@simplegeo.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>>>
>>>>>> >> >>>> Nice, Ed, we're doing something very similar
but less generic.
>>>>>> >> >>>> Now replace all of the various methods
for querying with a
>>>>>> simple
>>>>>> >> >>>> query
>>>>>> >> >>>> interface that takes a Predicate, allow
the user to specify
>>>>>> (in
>>>>>> >> >>>> storage-conf) which levels of the nested
Columns should be
>>>>>> indexed,
>>>>>> >> >>>> and
>>>>>> >> >>>> completely remove Comparators and have
people subclass Column
>>>>>> /
>>>>>> >> >>>> implement
>>>>>> >> >>>> IColumn and we'd really be on to something
;).
>>>>>> >> >>>> Mock storage-conf.xml:
>>>>>> >> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>>>>>> >> >>>> ClusterPartitioned="True" Type="UTF8">
>>>>>> >> >>>>     <Column Name="ThingThatsNowColumnFamily"
>>>>>> DiskPartitioned="True"
>>>>>> >> >>>> Type="UTF8">
>>>>>> >> >>>>       <Column Name="ThingThatsNowSuperColumnName"
Type="Long">
>>>>>> >> >>>>         <Column Name="ThingThatsNowColumnName"
Indexed="True"
>>>>>> >> >>>> Type="ASCII">
>>>>>> >> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>>>>>> >> >>>>         </Column>
>>>>>> >> >>>>       </Column>
>>>>>> >> >>>>     </Column>
>>>>>> >> >>>>   </Column>
>>>>>> >> >>>> Thrift:
>>>>>> >> >>>>   struct NamePredicate {
>>>>>> >> >>>>     1: required list<binary> column_names,
>>>>>> >> >>>>   }
>>>>>> >> >>>>   struct SlicePredicate {
>>>>>> >> >>>>     1: required binary start,
>>>>>> >> >>>>     2: required binary end,
>>>>>> >> >>>>   }
>>>>>> >> >>>>   struct CountPredicate {
>>>>>> >> >>>>     1: required struct predicate,
>>>>>> >> >>>>     2: required i32 count=100,
>>>>>> >> >>>>   }
>>>>>> >> >>>>   struct AndPredicate {
>>>>>> >> >>>>     1: required Predicate left,
>>>>>> >> >>>>     2: required Predicate right,
>>>>>> >> >>>>   }
>>>>>> >> >>>>   struct SubColumnsPredicate {
>>>>>> >> >>>>     1: required Predicate columns,
>>>>>> >> >>>>     2: required Predicate subcolumns,
>>>>>> >> >>>>   }
>>>>>> >> >>>>   ... OrPredicate, OtherUsefulPredicates
...
>>>>>> >> >>>>   query(predicate, count, consistency_level)
# Count here
>>>>>> would be
>>>>>> >> >>>> total
>>>>>> >> >>>> count of leaf values returned, whereas
CountPredicate
>>>>>> specifies a
>>>>>> >> >>>> column
>>>>>> >> >>>> count for a particular sub-slice.
>>>>>> >> >>>> Not fully baked... but I think this could
really simplify
>>>>>> stuff and
>>>>>> >> >>>> make
>>>>>> >> >>>> it more flexible. Downside is it may give
people enough rope
>>>>>> to hang
>>>>>> >> >>>> themselves, but at least the predicate
stuff is easily
>>>>>> distributable.
>>>>>> >> >>>> I'm thinking I'll play around with implementing
some of this
>>>>>> stuff
>>>>>> >> >>>> myself if I have any free time in the near
future.
>>>>>> >> >>>> Mike
>>>>>> >> >>>>
>>>>>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan
Ellis <
>>>>>> jbellis@gmail.com>
>>>>>> >> >>>> wrote:
>>>>>> >> >>>>>
>>>>>> >> >>>>> Very interesting, thanks!
>>>>>> >> >>>>>
>>>>>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed
Anuff <ed@anuff.com>
>>>>>> wrote:
>>>>>> >> >>>>> > Follow-up from last weeks discussion,
I've been playing
>>>>>> around
>>>>>> >> >>>>> > with a
>>>>>> >> >>>>> > simple
>>>>>> >> >>>>> > column comparator for composite
column names that I put up
>>>>>> on
>>>>>> >> >>>>> > github.  I'd
>>>>>> >> >>>>> > be interested to hear what people
think of this approach.
>>>>>> >> >>>>> >
>>>>>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType
>>>>>> >> >>>>> >
>>>>>> >> >>>>> > Ed
>>>>>> >> >>>>> >
>>>>>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52
PM, Ed Anuff <ed@anuff.com>
>>>>>> wrote:
>>>>>> >> >>>>> >>
>>>>>> >> >>>>> >> It might make sense to create
a CompositeType subclass of
>>>>>> >> >>>>> >> AbstractType for
>>>>>> >> >>>>> >> the purpose of constructing
and comparing these types of
>>>>>> >> >>>>> >> "composite"
>>>>>> >> >>>>> >> column
>>>>>> >> >>>>> >> names so that if you could
more easily do that sort of
>>>>>> thing
>>>>>> >> >>>>> >> rather
>>>>>> >> >>>>> >> than
>>>>>> >> >>>>> >> having to concatenate into
one big string.
>>>>>> >> >>>>> >>
>>>>>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25
AM, Mike Malone
>>>>>> >> >>>>> >> <mike@simplegeo.com>
>>>>>> >> >>>>> >> wrote:
>>>>>> >> >>>>> >>>
>>>>>> >> >>>>> >>> The only thing SuperColumns
appear to buy you (as someone
>>>>>> >> >>>>> >>> pointed
>>>>>> >> >>>>> >>> out to
>>>>>> >> >>>>> >>> me at the Cassandra meetup
- I think it was Eric
>>>>>> Florenzano) is
>>>>>> >> >>>>> >>> that you can
>>>>>> >> >>>>> >>> use different comparator
types for the Super/SubColumns,
>>>>>> I
>>>>>> >> >>>>> >>> guess..?
>>>>>> >> >>>>> >>> But you
>>>>>> >> >>>>> >>> should be able to do the
same thing by creating your own
>>>>>> Column
>>>>>> >> >>>>> >>> comparator.
>>>>>> >> >>>>> >>> I guess my point is that
SuperColumns are mostly a
>>>>>> convenience
>>>>>> >> >>>>> >>> mechanism, as
>>>>>> >> >>>>> >>> far as I can tell.
>>>>>> >> >>>>> >>> Mike
>>>>>> >> >>>>> >
>>>>>> >> >>>>> >
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> --
>>>>>> >> >>>>> Jonathan Ellis
>>>>>> >> >>>>> Project Chair, Apache Cassandra
>>>>>> >> >>>>> co-founder of Riptano, the source for
professional Cassandra
>>>>>> support
>>>>>> >> >>>>> http://riptano.com
>>>>>> >> >>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> AJ Chen, PhD
>>>>> Chair, Semantic Web SIG, sdforum.org
>>>>> http://web2express.org
>>>>> twitter @web2express
>>>>> Palo Alto, CA, USA
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> AJ Chen, PhD
>>> Chair, Semantic Web SIG, sdforum.org
>>> http://web2express.org
>>> twitter @web2express
>>> Palo Alto, CA, USA
>>>
>>>
>>>
>>
>>
>> --
>> AJ Chen, PhD
>> Chair, Semantic Web SIG, sdforum.org
>> http://web2express.org
>> twitter @web2express
>> Palo Alto, CA, USA
>>
>>
>>
>
>
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>

Mime
View raw message