cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Ricardo Motta Gomes <paulo.mo...@chaordicsystems.com>
Subject Re: efficiently generate complete database dump in text format
Date Thu, 09 Oct 2014 15:38:20 GMT
The best way to generate dumps from Cassandra is via Hadoop integration (or
spark). You can find more info here:

http://www.datastax.com/documentation/cassandra/2.1/cassandra/configuration/configHadoop.html
http://wiki.apache.org/cassandra/HadoopSupport

On Thu, Oct 9, 2014 at 4:19 AM, Gaurav Bhatnagar <gbhatnagar@gmail.com>
wrote:

> Hi,
>    We have a Cassandra database column family containing 320 millions rows
> and each row contains about 15 columns. We want to take monthly dump of
> this single column family contained in this database in text format.
>
> We are planning to take following approach to implement this functionality
> 1. Take a snapshot of Cassandra database using nodetool utility. We
> specify -cf flag to
>      specify column family name so that snapshot contains data
> corresponding to a single
>      column family.
> 2. We take backup of this snapshot and move this backup to a separate
> physical machine.
> 3. We using "SStable to json conversion" utility to json convert all the
> data files into json
>     format.
>
> We have following questions/doubts regarding the above approach
> a) Generated json records contains "d" (IS_MARKED_FOR_DELETE) flag in json
> record
>      and can I safely ignore all such json records?
> b) If I ignore all records marked by "d" flag, than can generated json
> files in step 3, contain
>     duplicate records? I mean do multiple entries for same key.
>
> Do there can be any other better approach to generate data dumps in text
> format.
>
> Regards,
> Gaurav
>



-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Mime
View raw message