incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Wiggen <kwig...@xythos.com>
Subject Newbie ? with get_range_slices
Date Mon, 12 Apr 2010 00:12:29 GMT

I have spent the last few days playing with Cassandra and I have attempted to create a simple
"Java->Thrift->Cassandra" Discussion Group Server (because the world needs another one)
to teach myself the data model and try everything out.

With all the great blog posts on cassandra out there, I am now able to read/write/delete/modify
a nested discussion server.  YEA!!!

I decided to have two simple ColumnFamilies.

One called Posts

Post = {
    '7561a442-24e2-11df-8924-001ff3591711': {                    //UUID
        'id': '7561a442-24e2-11df-8924-001ff3591711',            //ID == UUID
        'parent_id': '89da3178-24e2-11df-8924-001ff3591711'      //Parent Post UUID
        'author': 'a4a70900-24e1-11df-8924-001ff3591711',        //Users UUID
        'subject': 'This is a forum post',                       //Subject
        'body': 'Forum post body. This is awesome!',             //Body
        '_ts': '89da3178-24e2-11df-8924-001ff3596713',           //TimeUUID
    },
   }

Where the key is a simple UUID and the columns are the Forum/Post/Replies.  A Forum has a
hardcoded Parent UUID which I store in Java, while the Posts and Replies are tied to their
parent posts/forums/etc by  the parent_id.  I sort by UTF8Type, but it really doesn't matter
in this case as I drive into this map always by the Key and always get all columns (6 of them).

All queries drive into the second ColumnFamily called Threads

Thread = {
     '7561a442-24e2-11df-8924-001ff3591711': {                   //Parent thread UUID
        #timestamp of post: post UUID
        '89da3178-24e2-11df-8924-001ff3596713': '7561a442-24e2-11df-8924-001ff3591711',//TimeUUID
column name -> post UUID value
      },
    }

With a Parent UUID I can drive into Threads which will give me the list of Posts/Replies at
that level sorted by TimeUUID.  Column name is the post TimeUUID and the value is the Post
UUID.  This ColumnFamily is sorted by TimeUUID.

Thus I can walk the tree (of any depth) of Forum/Post/Replies with the Thread table.

I have this all working on a single cassandra node and it works great.  Inserts go to both
tables while deletes need to use the Thread ColumnFamily to recursively delete all child posts,
the Column in the Parent key of Thread and all associated data in Post.

Any comments on whether this is a good/terrible data model, etc so far are welcome.  :)

My question comes from the fact that during this process I have written/read/deleted many
"key->Columns" to these ColumnFamilies (many of which failed half-way through) so I decided
to write a "clean" script to remove all data from these ColumnFamilies (much like a truncate
table command in SQL).

Using the following Java code

      //get the ID column for each KEY we find
      List<byte[]> l_columns = new ArrayList<byte[]>();
      l_columns.add(Transcoder.encode(ID));
      SlicePredicate l_slicePredicate = new SlicePredicate();
      l_slicePredicate.setColumn_names(l_columns);
      //get 100 keys at a time
      KeyRange keyRange = new  KeyRange(100);
      keyRange.setStart_key("");
      keyRange.setEnd_key("");

      List<KeySlice> l_keySlices = p_context.getClient().get_range_slices("Discussions",
new ColumnParent("Posts"),
                                                                          l_slicePredicate,
keyRange, ConsistencyLevel.ONE);

I get ALL of the KEYS I ever wrote to the server.  Most of them have no Columns associated
with them.  In fact if I query the same key with

      SlicePredicate l_slicePredicate =  new SlicePredicate();
      SliceRange l_sliceRange = new SliceRange();
      l_sliceRange.setStart(new byte[] {});
      l_sliceRange.setFinish(new byte[] {});
      l_slicePredicate.setSlice_range(l_sliceRange);
      List<ColumnOrSuperColumn> l_result =
        p_context.getClient().get_slice("Discussions", <KEY FROM GET_RANGE_SLICES>,
new ColumnParent("Posts"),
                                        l_slicePredicate, ConsistencyLevel.ONE);

it returns a empty array list (the same if I give it a KEY it has never seen).

It is OK with me if get_range_slices returns keys with no columns (although it makes it a
little harder to explain to others -- is there garbage collection that will clean these out
in the future?), however I am stuck on how to simply truncate the table without looping through
all the values looking for something that has a Column associated with it and then deleting
that key->value.

It is possible I am not deleting correctly as well.  For that I simply do:

p_context.getClient().remove("Discussions", p_postUUID.toString(),
                             new ColumnPath("Posts"), l_rightNow,
                             ConsistencyLevel.ALL);

Just trying to understand what I am getting and compare it against what I expected.  I am
also still trying to write a simple "clean" command.

If you read this far, thanks....  If you can add some clarity it would help me.  I have tried
to find it in archives and blog posts, but I didn't see anything.

Thanks,
Kevin




This email and any attachments may contain confidential and proprietary information of Xythos
that is for the sole use of the intended recipient. If you are not the intended recipient,
disclosure, copying, re-distribution or other use of any of this information is strictly prohibited.
Please immediately notify the sender and delete this transmission if you received this email
in error.

Mime
View raw message