Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of colin@mollenhour.com designates
 208.106.250.144 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws; s=all; d=mollenhour.com; q=dns;
	h=received:message-id:date:from:user-agent:mime-version:to:subject:content-type:content-transfer-encoding;
	b=AfsWWgCHeOwB2+tzvwwtdgoTpKfnxre/Q3I8ciNws89hSBkXoxVTX2Abhe+6rz3LZeoPRi9USCt+qyYKDV6xcA==;
Message-ID: <4A6EC441.8030208@mollenhour.com>
Date: Tue, 28 Jul 2009 05:26:25 -0400
From: Colin Mollenhour <colin@mollenhour.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: cassandra-user@incubator.apache.org
Subject: Greetings!
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi all, I am new to the Cassandra scene. I have watched presentations,
read papers and articles, run the server with some basic usage, digested
the thrift interface and disseminated as much info as possible with
frying my brain with all of this stuff. I am working on a web app that
will have some social networking aspects as well as some other features
that involve lots of "event" records and while I have a good enough
understanding to do some damage, I don't feel comfortable writing an app
just yet.. I actually started the app with a PHP framework and a MySQL
schema that isn't too complex and have started distilling it into a
Cassandra schema as best I can but this is where I am getting stuck. I'm
not sure if I'm trying to fit a square peg into a round hole or if I am
just not lining it up right so perhaps you can help me?

I've been going off of the "twitter" examples (Evan Weaver, Eric
Florenzano) as my point of reference but have a few questions about
specifics. Background:
I have for the most part, "users", "journals", and "events".
Events have one of several types (variable) and are either a start-end
range or a single point in time and have various metadata.
Journals have multiple events plus various metadata.
In the lifetime of a journal I am estimating it will accrue 20k-60k events.
Users have multiple journals and can share access to journals with other
users.
Users will own <10 journals but some users might share access to more
than that at once.
I'd like it to scale to as many users as we can get to sign up,
potentially very very many, hence my interest in Cassandra :)

I need to be able to fetch all or latest events with the following
"queries":
-A specific journal
-All of a user's journals
-A specific event type
-A specific event type for a specific journal
-A specific event type for all of a user's journals

After much deliberation in trying to figure out how to do the above
without having to loop through many many queries here is the schema I
arrived at:
http://bit.ly/6Hj9I
If I am correct in my thinking, all of the above cases can be retrieved
in one or two steps with the maximum number of queries being determined
by the number of journals in question.

Am I wrong to try to reduce the number of indexes and round-trips to the
database by modeling this way?

Some more general questions:
My model assumes the use of get_slice_by_names with a potentially large
number of keys, is that ok?
Cassandra lacks transactions and increment methods, is there a way to
generate unique user ids with just Cassandra as the authority that I am
missing?
Is it silly to use short column names for the sake of performance or
storage efficiency? E.g. uid instead of user_id. I like verbose names...

Thanks!
Colin