incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <ober...@civicscience.com>
Subject hadoop results
Date Wed, 29 Jun 2011 18:35:27 GMT
I'll start with my question: given a CF with comparator TimeUUIDType, what
is the most efficient way to get the greatest column's value?

Context: I've been running cassandra for a couple of months now, so
obviously it's time to start layering more on top :-)  In my test
environment, I managed to get pig/hadoop running, and developed a few
scripts to collect metrics I've been missing since I switched from MySQL to
cassandra (including the ever useful "select count(*) from table"
equivalent).

I was hoping to dump the results of this processing back into cassandra for
use in other tools/processes.  My initial thought was: new CF called "stats"
with comparator TimeUUIDType.  The basic idea being I'd store:
stat_name -> time stat was computed (as UUID) -> value
That way I can also see a historical perspective of any given stat for
auditing (and for cumulative stats to see trends).  The stat_name itself is
a URI that is composed of "what" and any constraints on the "what"
(including an optional time range, if the stat supports it).  E.g.
ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still
deciding on the format of the URI).  But, right now, the only way I know to
get the "current" stat value would be to iterate over all columns (the
TimeUUIDs) and then return the last one.

Thanks for any tips,

will

Mime
View raw message