cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ Chen <ajc...@web2express.org>
Subject Re: Is SuperColumn necessary?
Date Tue, 11 May 2010 00:25:45 GMT
your suggestion works for fixed supercolumn name. the blog example now
becomes:
{ blog-id {name, title, ...}
  blog-id-comments {time:commenter}
}

what about supercolumn names that are not fixed? for example, I want to
store comment's details with the blog like this:
{ blog-id { blog { name, title, ...}
              comments {comment-id:commenter}
              comment-id {commenter, time, text, ...}
}

a comment-id is generated on-the-fly when the comment is made.  how do you
flatten the comment-id supercolumn to normal column?  just for brain
exercise, not meant to pick on you.

thanks,
-aj



On Mon, May 10, 2010 at 4:39 PM, William Ashley <washley@gmail.com> wrote:

> If you're storing your super column under a fixed name, you could just
> concatenate that name with the row key and use normal columns. Then you get
> your paging and sorting the way you want it.
>
>
> On May 10, 2010, at 4:31 PM, AJ Chen wrote:
>
> supercolumn is good for modeling profile type of data. simple example is
> blog:
> blog { blog {author,  title, ...}
>          comments   {time: commenter}  //sort by TimeUUID
> }
> when retrieving a blog, you get all the comments sorted by time already.
> without supercolumn, you would need to concatenate multiple comment times
> together as you suggested.
>
> requiring user to concatenating data fields together is not only an extra
> burden on user but also a less clean design.  there will be cases where the
> list property of a profile data is a long list (say a million items). in
> such cases, user wants to be able to directly insert/delete an item in that
> list because it's more efficient.  Retrieving the whole list, updating it,
> concatenating again, and then putting it back to datastore is awkward and
> less efficient.
>
> -aj
>
>
> On Mon, May 10, 2010 at 2:20 PM, Mike Malone <mike@simplegeo.com> wrote:
>
>> On Mon, May 10, 2010 at 1:38 PM, AJ Chen <ajchen@web2express.org> wrote:
>>
>>> Could someone confirm this discussion is not about abandoning supercolumn
>>> family? I have found modeling data with supercolumn family is actually an
>>> advantage of cassadra compared to relational database. Hope you are going to
>>> drop this important concept.  How it's implemented internally is a different
>>> matter.
>>>
>>
>> SuperColumns are useful as a convenience mechanism. That's pretty much it.
>> There's _nothing_ (as far as I can tell) that you can do with SuperColumns
>> that you can't do by manually concatenating key names with a separator on
>> the client side and implementing a custom comparator on the server (as ugly
>> as that is).
>>
>> This discussion is about getting rid of SuperColumns and adding a more
>> generic mechanism that will actually be useful and interesting and will
>> continue to be convenient for the types of use cases for which people use
>> SuperColumns.
>>
>> If there's a particular use case that you feel you can only implement with
>> SuperColumns, please share! I honestly can't think of any.
>>
>> Mike
>>
>>
>>> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jshook@gmail.com>wrote:
>>>
>>>> Agreed
>>>>
>>>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <mike@simplegeo.com>
>>>> wrote:
>>>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jshook@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> I have to disagree about the naming of things. The name of something
>>>> >> isn't just a literal identifier. It affects the way people think
>>>> about
>>>> >> it. For new users, the whole naming thing has been a persistent
>>>> >> barrier.
>>>> >
>>>> > I'm saying we shouldn't be worried too much about coming up with names
>>>> and
>>>> > analogies until we've decided what it is we're naming.
>>>> >
>>>> >>
>>>> >> As for your suggestions, I'm all for simplifying or generalizing
the
>>>> >> "how it works" part down to a more generalized set of operations.
I'm
>>>> >> not sure it's a good idea to require users to think in terms building
>>>> >> up a fluffy query structure just to thread it through a needle of
an
>>>> >> API, even for the simplest of queries. At some point, the level
of
>>>> >> generic boilerplate takes away from the semantic hand rails that
>>>> >> developers like. So I guess I'm suggesting that "how it works" and
>>>> >> "how we use it" are not always exactly the same. At least they should
>>>> >> both hinge on a common conceptual model, which is where the naming
>>>> >> becomes an important anchoring point.
>>>> >
>>>> > If things are done properly, client libraries could expose simplified
>>>> query
>>>> > interfaces without much effort. Most ORMs these days work by building
>>>> a
>>>> > propositional directed acyclic graph that's serialized to SQL. This
>>>> would
>>>> > work the same way, but it wouldn't be converted into a 4GL.
>>>> > Mike
>>>> >
>>>> >>
>>>> >> Jonathan
>>>> >>
>>>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <mike@simplegeo.com>
>>>> wrote:
>>>> >> > Maybe... but honestly, it doesn't affect the architecture or
>>>> interface
>>>> >> > at
>>>> >> > all. I'm more interested in thinking about how the system should
>>>> work
>>>> >> > than
>>>> >> > what things are called. Naming things are important, but that
can
>>>> happen
>>>> >> > later.
>>>> >> > Does anyone have any thoughts or comments on the architecture
I
>>>> >> > suggested
>>>> >> > earlier?
>>>> >> >
>>>> >> > Mike
>>>> >> >
>>>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zsongbo@gmail.com
>>>> >
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Yes, the "column" here is not appropriate.
>>>> >> >> Maybe we need not to create new terms, in Google's Bigtable,
the
>>>> term
>>>> >> >> "qualifier" is a good one.
>>>> >> >>
>>>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <
>>>> david@lookin2.com>
>>>> >> >> wrote:
>>>> >> >>>
>>>> >> >>> That would be a good time to get rid of the confusing
"column"
>>>> term,
>>>> >> >>> which incorrectly suggests a two-dimensional tabular
structure.
>>>> >> >>>
>>>> >> >>> Suggestions:
>>>> >> >>>
>>>> >> >>> 1. A hypercube (or hypocube, if only two dimensions):
replace
>>>> "key"
>>>> >> >>> and
>>>> >> >>> "column" with "1st dimension", "2nd dimension", etc.
>>>> >> >>>
>>>> >> >>> 2. A file system: replace "key" and "column" with "directory"
and
>>>> >> >>> "subdirectory"
>>>> >> >>>
>>>> >> >>> 3. A tuple tree: "Column family" replaced by top-level
tuple,
>>>> whose
>>>> >> >>> value
>>>> >> >>> is the set of keys, whose value is the set of supercolumns
of the
>>>> key,
>>>> >> >>> whose
>>>> >> >>> value is the set of columns for the supercolumn, etc.
>>>> >> >>>
>>>> >> >>> 4. Etc.
>>>> >> >>>
>>>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <mike@simplegeo.com>
>>>> >> >>> wrote:
>>>> >> >>>>
>>>> >> >>>> Nice, Ed, we're doing something very similar but
less generic.
>>>> >> >>>> Now replace all of the various methods for querying
with a
>>>> simple
>>>> >> >>>> query
>>>> >> >>>> interface that takes a Predicate, allow the user
to specify (in
>>>> >> >>>> storage-conf) which levels of the nested Columns
should be
>>>> indexed,
>>>> >> >>>> and
>>>> >> >>>> completely remove Comparators and have people subclass
Column /
>>>> >> >>>> implement
>>>> >> >>>> IColumn and we'd really be on to something ;).
>>>> >> >>>> Mock storage-conf.xml:
>>>> >> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>>>> >> >>>> ClusterPartitioned="True" Type="UTF8">
>>>> >> >>>>     <Column Name="ThingThatsNowColumnFamily"
>>>> DiskPartitioned="True"
>>>> >> >>>> Type="UTF8">
>>>> >> >>>>       <Column Name="ThingThatsNowSuperColumnName"
Type="Long">
>>>> >> >>>>         <Column Name="ThingThatsNowColumnName"
Indexed="True"
>>>> >> >>>> Type="ASCII">
>>>> >> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>>>> >> >>>>         </Column>
>>>> >> >>>>       </Column>
>>>> >> >>>>     </Column>
>>>> >> >>>>   </Column>
>>>> >> >>>> Thrift:
>>>> >> >>>>   struct NamePredicate {
>>>> >> >>>>     1: required list<binary> column_names,
>>>> >> >>>>   }
>>>> >> >>>>   struct SlicePredicate {
>>>> >> >>>>     1: required binary start,
>>>> >> >>>>     2: required binary end,
>>>> >> >>>>   }
>>>> >> >>>>   struct CountPredicate {
>>>> >> >>>>     1: required struct predicate,
>>>> >> >>>>     2: required i32 count=100,
>>>> >> >>>>   }
>>>> >> >>>>   struct AndPredicate {
>>>> >> >>>>     1: required Predicate left,
>>>> >> >>>>     2: required Predicate right,
>>>> >> >>>>   }
>>>> >> >>>>   struct SubColumnsPredicate {
>>>> >> >>>>     1: required Predicate columns,
>>>> >> >>>>     2: required Predicate subcolumns,
>>>> >> >>>>   }
>>>> >> >>>>   ... OrPredicate, OtherUsefulPredicates ...
>>>> >> >>>>   query(predicate, count, consistency_level) #
Count here would
>>>> be
>>>> >> >>>> total
>>>> >> >>>> count of leaf values returned, whereas CountPredicate
specifies
>>>> a
>>>> >> >>>> column
>>>> >> >>>> count for a particular sub-slice.
>>>> >> >>>> Not fully baked... but I think this could really
simplify stuff
>>>> and
>>>> >> >>>> make
>>>> >> >>>> it more flexible. Downside is it may give people
enough rope to
>>>> hang
>>>> >> >>>> themselves, but at least the predicate stuff is
easily
>>>> distributable.
>>>> >> >>>> I'm thinking I'll play around with implementing
some of this
>>>> stuff
>>>> >> >>>> myself if I have any free time in the near future.
>>>> >> >>>> Mike
>>>> >> >>>>
>>>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis
<
>>>> jbellis@gmail.com>
>>>> >> >>>> wrote:
>>>> >> >>>>>
>>>> >> >>>>> Very interesting, thanks!
>>>> >> >>>>>
>>>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <ed@anuff.com>
wrote:
>>>> >> >>>>> > Follow-up from last weeks discussion,
I've been playing
>>>> around
>>>> >> >>>>> > with a
>>>> >> >>>>> > simple
>>>> >> >>>>> > column comparator for composite column
names that I put up on
>>>> >> >>>>> > github.  I'd
>>>> >> >>>>> > be interested to hear what people think
of this approach.
>>>> >> >>>>> >
>>>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType
>>>> >> >>>>> >
>>>> >> >>>>> > Ed
>>>> >> >>>>> >
>>>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff
<ed@anuff.com>
>>>> wrote:
>>>> >> >>>>> >>
>>>> >> >>>>> >> It might make sense to create a CompositeType
subclass of
>>>> >> >>>>> >> AbstractType for
>>>> >> >>>>> >> the purpose of constructing and comparing
these types of
>>>> >> >>>>> >> "composite"
>>>> >> >>>>> >> column
>>>> >> >>>>> >> names so that if you could more easily
do that sort of thing
>>>> >> >>>>> >> rather
>>>> >> >>>>> >> than
>>>> >> >>>>> >> having to concatenate into one big
string.
>>>> >> >>>>> >>
>>>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM,
Mike Malone
>>>> >> >>>>> >> <mike@simplegeo.com>
>>>> >> >>>>> >> wrote:
>>>> >> >>>>> >>>
>>>> >> >>>>> >>> The only thing SuperColumns appear
to buy you (as someone
>>>> >> >>>>> >>> pointed
>>>> >> >>>>> >>> out to
>>>> >> >>>>> >>> me at the Cassandra meetup - I
think it was Eric
>>>> Florenzano) is
>>>> >> >>>>> >>> that you can
>>>> >> >>>>> >>> use different comparator types
for the Super/SubColumns, I
>>>> >> >>>>> >>> guess..?
>>>> >> >>>>> >>> But you
>>>> >> >>>>> >>> should be able to do the same
thing by creating your own
>>>> Column
>>>> >> >>>>> >>> comparator.
>>>> >> >>>>> >>> I guess my point is that SuperColumns
are mostly a
>>>> convenience
>>>> >> >>>>> >>> mechanism, as
>>>> >> >>>>> >>> far as I can tell.
>>>> >> >>>>> >>> Mike
>>>> >> >>>>> >
>>>> >> >>>>> >
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> --
>>>> >> >>>>> Jonathan Ellis
>>>> >> >>>>> Project Chair, Apache Cassandra
>>>> >> >>>>> co-founder of Riptano, the source for professional
Cassandra
>>>> support
>>>> >> >>>>> http://riptano.com
>>>> >> >>>>
>>>> >> >>>
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> AJ Chen, PhD
>>> Chair, Semantic Web SIG, sdforum.org
>>> http://web2express.org
>>> twitter @web2express
>>> Palo Alto, CA, USA
>>>
>>
>>
>
>
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>
>
>


-- 
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
twitter @web2express
Palo Alto, CA, USA

Mime
View raw message