incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: migration to a new model
Date Wed, 04 Jun 2014 15:49:19 GMT
OK Marcelo, I'll work on it today. -ml


On Tue, Jun 3, 2014 at 8:24 PM, Marcelo Elias Del Valle <
marcelo@s1mbi0se.com.br> wrote:

> Hi Michael,
>
> For sure I would be interested in this program!
>
> I am new both to python and for cql. I started creating this copier, but
> was having problems with timeouts. Alex solved my problem here on the list,
> but I think I will still have a lot of trouble making the copy to work fine.
>
> I open sourced my version here:
> https://github.com/s1mbi0se/cql_record_processor
>
> Just in case it's useful for anything.
>
> However, I saw CQL has support for concurrency itself and having something
> made by someone who knows Python CQL Driver better would be very helpful.
>
> My two servers today are at OVH (ovh.com), we have servers at AWS but but
> several cases we prefer other hosts. Both servers have SDD and 64 Gb RAM, I
> could use the script as a benchmark for you if you want. Besides, we have
> some bigger clusters, I could run on the just to test the speed if this is
> going to help.
>
> Regards
> Marcelo.
>
>
> 2014-06-03 11:40 GMT-03:00 Laing, Michael <michael.laing@nytimes.com>:
>
> Hi Marcelo,
>>
>> I could create a fast copy program by repurposing some python apps that I
>> am using for benchmarking the python driver - do you still need this?
>>
>> With high levels of concurrency and multiple subprocess workers, based on
>> my current actual benchmarks, I think I can get well over 1,000 rows/second
>> on my mac and significantly more in AWS. I'm using variable size rows
>> averaging 5kb.
>>
>> This would be the initial version of a piece of the benchmark suite we
>> will release as part of our nyt⨍aбrik project on 21 June for my
>> Cassandra Day NYC talk re the python driver.
>>
>> ml
>>
>>
>> On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle <
>> marcelo@s1mbi0se.com.br> wrote:
>>
>>> Hi Jens,
>>>
>>> Thanks for trying to help.
>>>
>>> Indeed, I know I can't do it using just CQL. But what would you use to
>>> migrate data manually? I tried to create a python program using auto
>>> paging, but I am getting timeouts. I also tried Hive, but no success.
>>> I only have two nodes and less than 200Gb in this cluster, any simple
>>> way to extract the data quickly would be good enough for me.
>>>
>>> Best regards,
>>> Marcelo.
>>>
>>>
>>>
>>> 2014-06-02 15:08 GMT-03:00 Jens Rantil <jens.rantil@tink.se>:
>>>
>>> Hi Marcelo,
>>>>
>>>> Looks like you can't do this without migrating your data manually:
>>>> https://stackoverflow.com/questions/18421668/alter-cassandra-column-family-primary-key-using-cassandra-cli-or-cql
>>>>
>>>> Cheers,
>>>> Jens
>>>>
>>>>
>>>> On Mon, Jun 2, 2014 at 7:48 PM, Marcelo Elias Del Valle <
>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have some cql CFs in a 2 node Cassandra 2.0.8 cluster.
>>>>>
>>>>> I realized I created my column family with the wrong partition.
>>>>> Instead of:
>>>>>
>>>>> CREATE TABLE IF NOT EXISTS entity_lookup (
>>>>>   name varchar,
>>>>>   value varchar,
>>>>>   entity_id uuid,
>>>>>   PRIMARY KEY ((name, value), entity_id))
>>>>> WITH
>>>>>     caching=all;
>>>>>
>>>>> I used:
>>>>>
>>>>> CREATE TABLE IF NOT EXISTS entitylookup (
>>>>>   name varchar,
>>>>>   value varchar,
>>>>>   entity_id uuid,
>>>>>   PRIMARY KEY (name, value, entity_id))
>>>>> WITH
>>>>>     caching=all;
>>>>>
>>>>>
>>>>> Now I need to migrate the data from the second CF to the first one.
>>>>> I am using Data Stax Community Edition.
>>>>>
>>>>> What would be the best way to convert data from one CF to the other?
>>>>>
>>>>> Best regards,
>>>>> Marcelo.
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message