incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <>
Subject hadoop results
Date Wed, 29 Jun 2011 18:35:27 GMT
I'll start with my question: given a CF with comparator TimeUUIDType, what
is the most efficient way to get the greatest column's value?

Context: I've been running cassandra for a couple of months now, so
obviously it's time to start layering more on top :-)  In my test
environment, I managed to get pig/hadoop running, and developed a few
scripts to collect metrics I've been missing since I switched from MySQL to
cassandra (including the ever useful "select count(*) from table"

I was hoping to dump the results of this processing back into cassandra for
use in other tools/processes.  My initial thought was: new CF called "stats"
with comparator TimeUUIDType.  The basic idea being I'd store:
stat_name -> time stat was computed (as UUID) -> value
That way I can also see a historical perspective of any given stat for
auditing (and for cumulative stats to see trends).  The stat_name itself is
a URI that is composed of "what" and any constraints on the "what"
(including an optional time range, if the stat supports it).  E.g.
ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still
deciding on the format of the URI).  But, right now, the only way I know to
get the "current" stat value would be to iterate over all columns (the
TimeUUIDs) and then return the last one.

Thanks for any tips,


View raw message