cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivasa T N <>
Subject Re: Storing large files for later processing through hadoop
Date Fri, 02 Jan 2015 16:53:39 GMT
On Fri, Jan 2, 2015 at 5:54 PM, mck <> wrote:

> You could manually chunk them down to 64Mb pieces.
> Can this split and combine be done automatically by cassandra when
inserting/fetching the file without application being bothered about it?

> > 2) Can I replace HDFS with Cassandra so that I don't have to sync/fetch
> > the file from cassandra to HDFS when I want to process it in hadoop
> cluster?
> We┬╣ keep HDFS as a volatile filesystem simply for hadoop internals. No
> need for backups of it, no need to upgrade data, and we're free to wipe
> it whenever hadoop has been stopped.
> ~mck

Since the hadoop MR streaming job requires the file to be processed to be
present in HDFS, I was thinking whether can it get directly from mongodb
instead of me manually fetching it and placing it in a directory before
submitting the hadoop job?

>> There was a datastax project before in being able to replace HDFS with
>> Cassandra, but i don't think it's alive anymore.

I think you are referring to Brisk project (
but I don't know its current status.

Can I use for my task in hand?


View raw message