cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject simple map / table scans without hadoop?
Date Sat, 27 Sep 2014 04:08:10 GMT
I have the requirements to periodically run full tables scans on our data.
It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do
it in Java because I need something mildly trivial.

Pig / hadoop / etc are mildly overkill for this.  I don’t want or need a
whole hadoop or HDFS setup for this.

For example, a full table scan, and if a field matches a regex, set another
column based on that value.

Seems like this wouldn’t be too hard.  Just write a daemon that looks at
the key distribution and runs a scan on the data closest to it.  It would
be ideal if it was in a separate daemon so that you couldn’t accidentally
read all that data into memory and then OOM the Cassandra daemon.

Does this already exist?


Location: *San Francisco, CA*
… or check out my Google+ profile

View raw message