cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Malone <m...@simplegeo.com>
Subject Re: Is SuperColumn necessary?
Date Mon, 10 May 2010 17:01:38 GMT
On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jshook@gmail.com> wrote:

> I have to disagree about the naming of things. The name of something
> isn't just a literal identifier. It affects the way people think about
> it. For new users, the whole naming thing has been a persistent
> barrier.
>

I'm saying we shouldn't be worried too much about coming up with names and
analogies until we've decided what it is we're naming.


> As for your suggestions, I'm all for simplifying or generalizing the
> "how it works" part down to a more generalized set of operations. I'm
> not sure it's a good idea to require users to think in terms building
> up a fluffy query structure just to thread it through a needle of an
> API, even for the simplest of queries. At some point, the level of
> generic boilerplate takes away from the semantic hand rails that
> developers like. So I guess I'm suggesting that "how it works" and
> "how we use it" are not always exactly the same. At least they should
> both hinge on a common conceptual model, which is where the naming
> becomes an important anchoring point.
>

If things are done properly, client libraries could expose simplified query
interfaces without much effort. Most ORMs these days work by building a
propositional directed acyclic graph that's serialized to SQL. This would
work the same way, but it wouldn't be converted into a 4GL.

Mike


>
> Jonathan
>
> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <mike@simplegeo.com> wrote:
> > Maybe... but honestly, it doesn't affect the architecture or interface at
> > all. I'm more interested in thinking about how the system should work
> than
> > what things are called. Naming things are important, but that can happen
> > later.
> > Does anyone have any thoughts or comments on the architecture I suggested
> > earlier?
> >
> > Mike
> >
> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zsongbo@gmail.com>
> wrote:
> >>
> >> Yes, the "column" here is not appropriate.
> >> Maybe we need not to create new terms, in Google's Bigtable, the term
> >> "qualifier" is a good one.
> >>
> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <david@lookin2.com>
> wrote:
> >>>
> >>> That would be a good time to get rid of the confusing "column" term,
> >>> which incorrectly suggests a two-dimensional tabular structure.
> >>>
> >>> Suggestions:
> >>>
> >>> 1. A hypercube (or hypocube, if only two dimensions): replace "key" and
> >>> "column" with "1st dimension", "2nd dimension", etc.
> >>>
> >>> 2. A file system: replace "key" and "column" with "directory" and
> >>> "subdirectory"
> >>>
> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose
> value
> >>> is the set of keys, whose value is the set of supercolumns of the key,
> whose
> >>> value is the set of columns for the supercolumn, etc.
> >>>
> >>> 4. Etc.
> >>>
> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <mike@simplegeo.com>
> wrote:
> >>>>
> >>>> Nice, Ed, we're doing something very similar but less generic.
> >>>> Now replace all of the various methods for querying with a simple
> query
> >>>> interface that takes a Predicate, allow the user to specify (in
> >>>> storage-conf) which levels of the nested Columns should be indexed,
> and
> >>>> completely remove Comparators and have people subclass Column /
> implement
> >>>> IColumn and we'd really be on to something ;).
> >>>> Mock storage-conf.xml:
> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
> >>>> ClusterPartitioned="True" Type="UTF8">
> >>>>     <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True"
> >>>> Type="UTF8">
> >>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
> >>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
> >>>> Type="ASCII">
> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
> >>>>         </Column>
> >>>>       </Column>
> >>>>     </Column>
> >>>>   </Column>
> >>>> Thrift:
> >>>>   struct NamePredicate {
> >>>>     1: required list<binary> column_names,
> >>>>   }
> >>>>   struct SlicePredicate {
> >>>>     1: required binary start,
> >>>>     2: required binary end,
> >>>>   }
> >>>>   struct CountPredicate {
> >>>>     1: required struct predicate,
> >>>>     2: required i32 count=100,
> >>>>   }
> >>>>   struct AndPredicate {
> >>>>     1: required Predicate left,
> >>>>     2: required Predicate right,
> >>>>   }
> >>>>   struct SubColumnsPredicate {
> >>>>     1: required Predicate columns,
> >>>>     2: required Predicate subcolumns,
> >>>>   }
> >>>>   ... OrPredicate, OtherUsefulPredicates ...
> >>>>   query(predicate, count, consistency_level) # Count here would be
> total
> >>>> count of leaf values returned, whereas CountPredicate specifies a
> column
> >>>> count for a particular sub-slice.
> >>>> Not fully baked... but I think this could really simplify stuff and
> make
> >>>> it more flexible. Downside is it may give people enough rope to hang
> >>>> themselves, but at least the predicate stuff is easily distributable.
> >>>> I'm thinking I'll play around with implementing some of this stuff
> >>>> myself if I have any free time in the near future.
> >>>> Mike
> >>>>
> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <jbellis@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Very interesting, thanks!
> >>>>>
> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <ed@anuff.com> wrote:
> >>>>> > Follow-up from last weeks discussion, I've been playing around
with
> a
> >>>>> > simple
> >>>>> > column comparator for composite column names that I put up
on
> >>>>> > github.  I'd
> >>>>> > be interested to hear what people think of this approach.
> >>>>> >
> >>>>> > http://github.com/edanuff/CassandraCompositeType
> >>>>> >
> >>>>> > Ed
> >>>>> >
> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <ed@anuff.com>
wrote:
> >>>>> >>
> >>>>> >> It might make sense to create a CompositeType subclass
of
> >>>>> >> AbstractType for
> >>>>> >> the purpose of constructing and comparing these types of
> "composite"
> >>>>> >> column
> >>>>> >> names so that if you could more easily do that sort of
thing
> rather
> >>>>> >> than
> >>>>> >> having to concatenate into one big string.
> >>>>> >>
> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone <mike@simplegeo.com
> >
> >>>>> >> wrote:
> >>>>> >>>
> >>>>> >>> The only thing SuperColumns appear to buy you (as someone
pointed
> >>>>> >>> out to
> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano)
is
> >>>>> >>> that you can
> >>>>> >>> use different comparator types for the Super/SubColumns,
I
> guess..?
> >>>>> >>> But you
> >>>>> >>> should be able to do the same thing by creating your
own Column
> >>>>> >>> comparator.
> >>>>> >>> I guess my point is that SuperColumns are mostly a
convenience
> >>>>> >>> mechanism, as
> >>>>> >>> far as I can tell.
> >>>>> >>> Mike
> >>>>> >
> >>>>> >
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Jonathan Ellis
> >>>>> Project Chair, Apache Cassandra
> >>>>> co-founder of Riptano, the source for professional Cassandra support
> >>>>> http://riptano.com
> >>>>
> >>>
> >>
> >
> >
>

Mime
View raw message