incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Elias Del Valle <marc...@s1mbi0se.com.br>
Subject Re: migration to a new model
Date Wed, 04 Jun 2014 00:24:39 GMT
Hi Michael,

For sure I would be interested in this program!

I am new both to python and for cql. I started creating this copier, but
was having problems with timeouts. Alex solved my problem here on the list,
but I think I will still have a lot of trouble making the copy to work fine.

I open sourced my version here:
https://github.com/s1mbi0se/cql_record_processor

Just in case it's useful for anything.

However, I saw CQL has support for concurrency itself and having something
made by someone who knows Python CQL Driver better would be very helpful.

My two servers today are at OVH (ovh.com), we have servers at AWS but but
several cases we prefer other hosts. Both servers have SDD and 64 Gb RAM, I
could use the script as a benchmark for you if you want. Besides, we have
some bigger clusters, I could run on the just to test the speed if this is
going to help.

Regards
Marcelo.


2014-06-03 11:40 GMT-03:00 Laing, Michael <michael.laing@nytimes.com>:

> Hi Marcelo,
>
> I could create a fast copy program by repurposing some python apps that I
> am using for benchmarking the python driver - do you still need this?
>
> With high levels of concurrency and multiple subprocess workers, based on
> my current actual benchmarks, I think I can get well over 1,000 rows/second
> on my mac and significantly more in AWS. I'm using variable size rows
> averaging 5kb.
>
> This would be the initial version of a piece of the benchmark suite we
> will release as part of our nyt⨍aбrik project on 21 June for my Cassandra
> Day NYC talk re the python driver.
>
> ml
>
>
> On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>
>> Hi Jens,
>>
>> Thanks for trying to help.
>>
>> Indeed, I know I can't do it using just CQL. But what would you use to
>> migrate data manually? I tried to create a python program using auto
>> paging, but I am getting timeouts. I also tried Hive, but no success.
>> I only have two nodes and less than 200Gb in this cluster, any simple way
>> to extract the data quickly would be good enough for me.
>>
>> Best regards,
>> Marcelo.
>>
>>
>>
>> 2014-06-02 15:08 GMT-03:00 Jens Rantil <jens.rantil@tink.se>:
>>
>> Hi Marcelo,
>>>
>>> Looks like you can't do this without migrating your data manually:
>>> https://stackoverflow.com/questions/18421668/alter-cassandra-column-family-primary-key-using-cassandra-cli-or-cql
>>>
>>> Cheers,
>>> Jens
>>>
>>>
>>> On Mon, Jun 2, 2014 at 7:48 PM, Marcelo Elias Del Valle <
>>> marcelo@s1mbi0se.com.br> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have some cql CFs in a 2 node Cassandra 2.0.8 cluster.
>>>>
>>>> I realized I created my column family with the wrong partition. Instead
>>>> of:
>>>>
>>>> CREATE TABLE IF NOT EXISTS entity_lookup (
>>>>   name varchar,
>>>>   value varchar,
>>>>   entity_id uuid,
>>>>   PRIMARY KEY ((name, value), entity_id))
>>>> WITH
>>>>     caching=all;
>>>>
>>>> I used:
>>>>
>>>> CREATE TABLE IF NOT EXISTS entitylookup (
>>>>   name varchar,
>>>>   value varchar,
>>>>   entity_id uuid,
>>>>   PRIMARY KEY (name, value, entity_id))
>>>> WITH
>>>>     caching=all;
>>>>
>>>>
>>>> Now I need to migrate the data from the second CF to the first one.
>>>> I am using Data Stax Community Edition.
>>>>
>>>> What would be the best way to convert data from one CF to the other?
>>>>
>>>> Best regards,
>>>> Marcelo.
>>>>
>>>
>>>
>>
>

Mime
View raw message