incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: reading sstables stored in hdfs
Date Sat, 23 Mar 2013 23:14:27 GMT
For the gory details you're going to need to explore SSTableReader
and/or SSTableWriter.

On Sat, Mar 23, 2013 at 7:01 PM, Amit Kumar <> wrote:
> We don't want to setup a parallel  workflow for analytics, for which
> we use hadoop and it will be trivial to copy the new sstables that get
> created to the hdfs periodically and then have mappers read the
> sstable in parallel. Going through Thrift is an option -but an
> inefficient one and one that impacts production Cassandra.
> Amit
> On Sat, Mar 23, 2013 at 2:40 PM, Michael Kjellman
> <> wrote:
>> Just curious, why would you want to store sstables in HDFS?
>> On 3/23/13 12:43 PM, "Amit Kumar" <> wrote:
>>>I am starting some work on an input-format that would let us read
>>>sstables stored in HDFS, I wonder if anyone has worked on something
>>>similar before. I did come across
>>>However it's not open sourced/available yet.
>>>I am writing for a sanity check before I go too deep into this.
>>>I have a few questions -hoping someone here would be able to help.
>>>So far, I have been able to read sstables stored on the local file
>>>system using the SSTableScanner and the SSTableReader. I am wondering
>>>what would be a good way to proceed -having a custom implementation of
>>>RandomAccessFile like the (RandomAccessReader and the
>>>CompressedRandomAccessReader), that would use hadoop's  File System
>>>I did search for, but could have missed -Is there some documentation
>>>on the binary format of the data, index, and stats files? That might
>>>make it simpler for me to prototype without having to go through the
>>>Cassandra Internals. I am currently working of our production
>>>deployment that is 1.1.0.
>>>Any guidance if you want to give (I am new to Cassandra Internals).
>>>Many thanks
>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>> things. Start today:

Jonathan Ellis
Project Chair, Apache Cassandra

View raw message