incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Kumar <>
Subject Re: reading sstables stored in hdfs
Date Sat, 23 Mar 2013 23:17:01 GMT
Thanks Jonathan, I have been spending time with them to better know
them. Is there any documentation about the on disk file format of the
data, index and stats file?


On Sat, Mar 23, 2013 at 5:14 PM, Jonathan Ellis <> wrote:
> For the gory details you're going to need to explore SSTableReader
> and/or SSTableWriter.
> On Sat, Mar 23, 2013 at 7:01 PM, Amit Kumar <> wrote:
>> We don't want to setup a parallel  workflow for analytics, for which
>> we use hadoop and it will be trivial to copy the new sstables that get
>> created to the hdfs periodically and then have mappers read the
>> sstable in parallel. Going through Thrift is an option -but an
>> inefficient one and one that impacts production Cassandra.
>> Amit
>> On Sat, Mar 23, 2013 at 2:40 PM, Michael Kjellman
>> <> wrote:
>>> Just curious, why would you want to store sstables in HDFS?
>>> On 3/23/13 12:43 PM, "Amit Kumar" <> wrote:
>>>>I am starting some work on an input-format that would let us read
>>>>sstables stored in HDFS, I wonder if anyone has worked on something
>>>>similar before. I did come across
>>>>However it's not open sourced/available yet.
>>>>I am writing for a sanity check before I go too deep into this.
>>>>I have a few questions -hoping someone here would be able to help.
>>>>So far, I have been able to read sstables stored on the local file
>>>>system using the SSTableScanner and the SSTableReader. I am wondering
>>>>what would be a good way to proceed -having a custom implementation of
>>>>RandomAccessFile like the (RandomAccessReader and the
>>>>CompressedRandomAccessReader), that would use hadoop's  File System
>>>>I did search for, but could have missed -Is there some documentation
>>>>on the binary format of the data, index, and stats files? That might
>>>>make it simpler for me to prototype without having to go through the
>>>>Cassandra Internals. I am currently working of our production
>>>>deployment that is 1.1.0.
>>>>Any guidance if you want to give (I am new to Cassandra Internals).
>>>>Many thanks
>>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>>> things. Start today:
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder,
> @spyced

View raw message