incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: reading sstables stored in hdfs
Date Sat, 23 Mar 2013 23:14:27 GMT
For the gory details you're going to need to explore SSTableReader
and/or SSTableWriter.

On Sat, Mar 23, 2013 at 7:01 PM, Amit Kumar <kumaramit01@gmail.com> wrote:
> We don't want to setup a parallel  workflow for analytics, for which
> we use hadoop and it will be trivial to copy the new sstables that get
> created to the hdfs periodically and then have mappers read the
> sstable in parallel. Going through Thrift is an option -but an
> inefficient one and one that impacts production Cassandra.
>
> Amit
>
>
>
> On Sat, Mar 23, 2013 at 2:40 PM, Michael Kjellman
> <mkjellman@barracuda.com> wrote:
>> Just curious, why would you want to store sstables in HDFS?
>>
>> On 3/23/13 12:43 PM, "Amit Kumar" <kumaramit01@gmail.com> wrote:
>>
>>>I am starting some work on an input-format that would let us read
>>>sstables stored in HDFS, I wonder if anyone has worked on something
>>>similar before. I did come across
>>>
>>>http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.ht
>>>ml
>>>
>>>However it's not open sourced/available yet.
>>>
>>>I am writing for a sanity check before I go too deep into this.
>>>
>>>I have a few questions -hoping someone here would be able to help.
>>>
>>>So far, I have been able to read sstables stored on the local file
>>>system using the SSTableScanner and the SSTableReader. I am wondering
>>>what would be a good way to proceed -having a custom implementation of
>>>RandomAccessFile like the (RandomAccessReader and the
>>>CompressedRandomAccessReader), that would use hadoop's  File System
>>>API?
>>>
>>>
>>>I did search for, but could have missed -Is there some documentation
>>>on the binary format of the data, index, and stats files? That might
>>>make it simpler for me to prototype without having to go through the
>>>Cassandra Internals. I am currently working of our production
>>>deployment that is 1.1.0.
>>>
>>>Any guidance if you want to give (I am new to Cassandra Internals).
>>>
>>>Many thanks
>>>Amit
>>
>>
>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>>
>> things. Start today: www.copy.com.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Mime
View raw message