cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Cameron <jus...@instaclustr.com>
Subject Re: Extract big data to file
Date Wed, 08 Feb 2017 21:10:45 GMT
Actually using BEGINTOKEN and ENDTOKEN will only give you what you want if
you're using ByteOrderedPartitioner (not with the default murmur3). It also
looks like *datetimestamp *is a clustering column so that suggestion
probably wouldn't have applied anyway.

On Wed, 8 Feb 2017 at 13:04 Justin Cameron <justin@instaclustr.com> wrote:

> Ideally you would have the program/Spark job that receives the data from
> Kafka write it to a text file as it writes each row to Cassandra - that way
> you don't need to query Cassandra at all.
>
> If you need to dump this data ad-hoc, rather than on a regular schedule,
> your best bet is to write some code to do it as Kiril mentioned. A short
> python script would do the job, and you get the added bonus over CQLSH of
> being able to throttle the export if it is very large and likely to affect
> your cluster's performance.
>
> Alternatively, if *datetimestamp* is part of the table's partition key
> you could also use the BEGINTOKEN and ENDTOKEN options of CQLSH's COPY TO
> command to achieve what you want.
>
>
> On Wed, 8 Feb 2017 at 11:40 Kiril Menshikov <kmenshikov@gmail.com> wrote:
>
> Did you try to receive data through the code? cqlsh probably not the right
> tool to fetch 360G.
>
>
>
> On Feb 8, 2017, at 12:34, Cogumelos Maravilha <cogumelosmaravilha@sapo.pt>
> wrote:
>
> Hi list,
>
> My database stores data from Kafka. Using C* 3.0.10
>
> In my cluster I'm using:
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> The result of extract one day of data uncompressed is around 360G.
>
> I've find these approaches:
>
> echo "SELECT kafka from red where datetimestamp >= '2017-02-02 00:00:00'
> and datetimestamp < '2017-02-02 15:00:01';" | cqlsh 100.100.221.146 9042 >
> result.txt
> Here by default I get 100 rows.
>
> Using CAPTURE result.csv with paging off I always get the error out of
> memory. With paging on I need to put something heavy in the top of the
> Enter key. Crazy thing need to enable paging to get ride of out of memory!
> I've take a look to the result file and is empty, perhaps is cooking the
> result in memory to in the end past to disk.
>
> Is there another approach like this on ACID databases:
> copy (select kafka from red where datetimestamp >= '2017-02-02 00:00:00'
> and datetimestamp < '2017-02-02 15:00:01') to 'result.csv' WITH CSV HEADER;
>
> Thanks in advance.
>
>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> --

Justin Cameron

Senior Software Engineer | Instaclustr




This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Mime
View raw message