cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Finding big rows
Date Wed, 11 May 2011 10:19:52 GMT
Couple of questions to ask. You may also get some value from the #cassandra chat room where
you can have a bit more of a conversation. 

- checking you ran  nodetool scrub when upgrading to 0.7.3 ? (not related to the current problem,
just asking)
- what client library was using to write the data ?
- when you have DEBUG logging and run the get that fails do you see any  log messages that
say "collecting %s of %s" ? (these mean the columns are been read by the query even if not
returned). 
- not sure how easy it's going to be to pull 400MB of data through the server in one call.
Take a look at thrift_max_message_length_in_mb and thrift_framed_transport_size_in_mb in the
config. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11 May 2011, at 20:18, Meler Wojciech wrote:

> Thanks for reply. My app uses 7-bit ascii string row keys so I assume that they could
be directly used.
>  
> I’d like to fetch whole row. I was able to dump the big row with sstable2json, but
both my app and cli is unable to read the row from cassandra.
> I see in json dump that all columns are marked as "deletedAt": -9223372036854775808,
so SuperColumn::isMarkedForDelete() should return false. My cluster is running cassandra 0.7.4
and it path was 0.7.0->0.7.2->0.7.3->0.7.4.
> What’s wrong? Bloom filters seems to be OK - I couldn’t find tool for reading them
but attached program does the job.
> I’m sure that both my app and cli refer to proper keys this big rows is getting bigger
and bigger as my app appends new super- and sub-columns to it, but can’t read it:
> get mycf[utf8('my-key')];
> Returned 0 results.
> I’m really confused – tried to turn debug on, but I can’t see anything interesting
in it. Any ideas what to check next?
>  
>  
> Regards,
> Wojtek                                                                              
                     
>  
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Wednesday, May 11, 2011 12:29 AM
> To: user@cassandra.apache.org
> Subject: Re: Finding big rows
>  
> I'm not aware of anything to find the row sizes, and your code looks like a good approach.
Converting the key bytes to a string only makes sense if your app is doing the same thing.

>   
> In the cli try using one of the data type functions to format the key the same way as
your app is, e.g. get FooCF[utf8('my-key')]
>  
> The main limitation on Super Columns is that Sub columns are not indexed http://wiki.apache.org/cassandra/CassandraLimitations.
If you have a huge row use the get_slice() api call to get back slices of columns. The cli
does not support slicing columns. 
>  
> Hope that helps. 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 10 May 2011, at 20:41, Meler Wojciech wrote:
> 
> 
> Hello,
>  
> I’ve noticed very nice stats exposed with JMX. I was quite shocked when I saw that
MaxRowSize was about 400MB (it was expected to be several MB).
> What is the best way to find keys of such big rows?
>  
> I couldn’t find anything so I’ve written simple program to dump sizes from Index
files (see attachment),
> and got the keys, but when I used cassandra-cli to get such columns it said „Returned
0 results.”.
> I’ve realised that my app creates such big rows because it can’t read them from Cassandra
and recreates them every time.
>  
> Are there any tuneable limits for getting whole row?  Any limits on supercolumns?
>  
> Regards,
> Wojtek
>  
> "WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana
do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy
Gdansk - Polnoc w Gdansku pod numerem KRS 0000068548, o kapitale zakladowym 67.980.024,00
zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.
> 
> <IdxDump.java>
>  
> 
> 
> "WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana
do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy
Gdansk - Polnoc w Gdansku pod numerem KRS 0000068548, o kapitale zakladowym 67.980.024,00
zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.
> 
> <BFCheck.java>


Mime
View raw message