incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject useful little way to run locally with (pig|hive) && cassandra
Date Wed, 15 Jun 2011 17:35:42 GMT
We started doing this recently and thought it might be useful to others.

Pig (and Hive) have a sample function that allows you to sample data from your data store.

In pig it looks something like this:
mysample = SAMPLE myrelation 0.01;

One possible use for this, with pig and cassandra is to solve a conundrum of testing locally.
 We've wondered how to do this so we decided to do sampling of a column family (or set of
CFs), store into HDFS (or CFS), download locally, then import into your local Cassandra node.
 That gives you real data to test against with pig/hive or for other purposes.

That way, when you're flying out to the Hadoop Summit or the Cassandra SF event, you can play
with real data :).

Maybe others have been doing this for years, but if not, we're finding it handy.

Jeremy
Mime
View raw message