cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avinash Lakshman <avinash.laksh...@gmail.com>
Subject Re: secondary index support in Cassandra
Date Wed, 25 Mar 2009 03:34:07 GMT
Prashant had many other points out there whereby you don't need your second
approach. I guess that is what I was referring to.
Avinash

On Tue, Mar 24, 2009 at 8:19 PM, Jun Rao <junrao@almaden.ibm.com> wrote:

> Prashant,
>
> I forgot about another point you mentioned.
>
> In the new approach, carving out a chunk of data by hash (needed for node
> removal/addition) may not be efficient. In the worse case, we have to make a
> full scan of the data. It is possible to make it more efficient by following
> the strategy that you guys implemented when using the random hash function:
> prefixing each index key with the hash value. On the other hand, I am
> wondering if we really need to worry about the performance on
> adding/removing nodes. These are infrequent events and are also
> non-blocking.
>
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> junrao@almaden.ibm.com
>
> [image: Inactive hide details for Prashant Malik <pmalik@gmail.com>]
> Prashant Malik <pmalik@gmail.com>
>
>
>
>     *Prashant Malik <pmalik@gmail.com>*
>
>             03/24/2009 11:34 AM
>              Please respond to
>             cassandra-dev@incubator.apache.org
>
>
>
> To
>
> cassandra-dev@incubator.apache.org
> cc
>
>
> Subject
>
> Re: secondary index support in Cassandra
> Some questions Iline
>
> On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao <junrao@almaden.ibm.com> wrote:
>
> >
> >
> > We have an application that has groups and entities. A group has many
> > entities and an entity has a bunch of (attribute, value) pairs. A common
> > access pattern is to select some number of entities within a group with
> > attribute X equals to x and ordered by attribute Y. For efficiency, we
> want
> > to build a secondary index for each group and collocate a group and its
> > secondary index on the same node. Our current approach is to map a group
> to
> > a row in Cassandra and each entity to a column in a column family (CF).
> > Within the same row, we use a separate CF (ordered by name) to implement
> a
> > secondary index, say on attribute X and Y. In this family, each column
> name
> > has the form of X:x:Y:y:entityID. We extended the get_slice() function so
> > that it can get a slice of columns starting from a given column. The
> > extended function uses the column-level index to locate the starting
> column
> > quickly. (We'd be happy to contribute this extension back to Cassandra if
> > people find this useful). Using the extended get_slice(), we were able to
> > access the entities through the simulated secondary index.
> >
> > We see a couple of problems with the current approach. First, our
> > application has to maintain the index. This is inefficient and could
> leave
> > the index inconsistent when failure occurs.  Second, mapping each entity
> to
> > a column may not be a good idea. Often, there is some sort of locking for
> > each row access. Putting many entities per row limits concurrency. Today,
> > in Cassandra, a full row is deserialized into memory during compaction.
> > This limits the number of entities that can be put in a single row. Also,
> > intuitively, an entity is more naturally represented as a row with
> > attributes stored as columns.
>
>
> Prashant
>
>   1. Application can send the index  and the entityId update in the same
> write , a write per row is always atomic given that teh index and the data
> have teh same key in the above
>       case the index will not be out of sync.
>    2. The maintainance of index by app can be moved into cassandra and I
> agree with fact that you can add support for it by a built in special CF
> which you have to do in either of the approaches you are taking.
>        Infact in the first approach that you are taking it will be easier
> to move the indexes in case of adding nodes to the cluster and when files
> are split and data is sent over. In the second approach this
>        process could get comlicated.
>    3. There is no locking for row access in cassandra.
>    4. Compactions can be opmtimized for name sorted columns , this is one
> of the workitems we have where we do not deserialize the entire row in
> compactiopn but only do it in slices , this is easily achievable for
>        name sorted columns.
>    5. The entity can still be represented naturally as a supercolumn where
> each of the super column name is the entity Id and each of the columns in
> the supercolumn are attribute value pairs.
>    6.  How many entities per  groupid are we talking about here ? why do
> you feel concurrency is limited by entities per row ?
>
>
> >
> > To address the above problems, we are thinking of the following new
> > implementation. Each entity is mapped to a row in Cassandra and uses a
> > two-part key (groupID, entityID). We use the groupID to hash an entity to
> a
> > node. This way, all entities for a group will be collocated in the same
> > node. We then define a special CF to serve as the secondary index. In the
> > definition, we specify what entity attributes need to be indexed  and in
> > what order. Within a node, this special CF will index all rows stored
> > locally. Every time we insert a new entity, the server automatically
> > extracts the index key based on the index definition (for example, the
> > index key can be of the form "hash(rowkey):attribute1:attribute2:rowkey)
> > and add the index entry to the special CF. We can then access the
> entities
> > using an extended version of the query language in Cassandra. For
> example,
> > if we issue the following query and there is an index defined by
> > (attributeX, attributeY), the query can be evaluated using the index in
> the
> > special CF. (Note that AppEngine supports this flavor of queries.)
> >
> > select attributeZ
> > from ROWS(HASH = hash(groupID))
> > where attributeX="x"
> > order by attributeY desc
> > limit 50
> >
> > We are in the middle of prototyping this approach. We'd like to hear if
> > other people are interested in this too or if people think there are
> better
> > alternatives.
> >
>
>
> Prashant
>
>   1. In this approach I see a number of problems during addition , removal
> and moving of  new nodes. These problems are not impossible to solve but
> just harder and inefficient with the above approach.
>   2. What are the wins of this approach over the other is not clear to me.
> could you please highlight those.
>
>
> >
> > Jun
> > IBM Almaden Research Center
> > K55/B1, 650 Harry Road, San Jose, CA  95120-6099
> >
> > junrao@almaden.ibm.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message