incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: unable to read saved rowcache from disk
Date Thu, 15 Nov 2012 06:43:19 GMT
3G, other jvm parameters are unchanged.


On Thu, Nov 15, 2012 at 2:40 PM, Wz1975 <wz1975@yahoo.com> wrote:

> How big is your heap?  Did you change the jvm parameter?
>
>
>
> Thanks.
> -Wei
>
> Sent from my Samsung smartphone on AT&T
>
>
> -------- Original message --------
> Subject: Re: unable to read saved rowcache from disk
> From: Manu Zhang <owenzhang1990@gmail.com>
> To: user@cassandra.apache.org
> CC:
>
>
> add a counter and print out myself
>
>
> On Thu, Nov 15, 2012 at 1:51 PM, Wz1975 <wz1975@yahoo.com> wrote:
>
>> Curious where did you see this?
>>
>>
>> Thanks.
>> -Wei
>>
>> Sent from my Samsung smartphone on AT&T
>>
>>
>> -------- Original message --------
>> Subject: Re: unable to read saved rowcache from disk
>> From: Manu Zhang <owenzhang1990@gmail.com>
>> To: user@cassandra.apache.org
>> CC:
>>
>>
>> OOM at deserializing 747321th row
>>
>>
>> On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang <owenzhang1990@gmail.com>wrote:
>>
>>> oh, as for the number of rows, it's 1650000. How long would you expect
>>> it to be read back?
>>>
>>>
>>> On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu <wz1975@yahoo.com> wrote:
>>>
>>>> Good information Edward.
>>>> For my case, we have good size of RAM (76G) and the heap is 8G. So I
>>>> set the row cache to be 800M as recommended. Our column is kind of big, so
>>>> the hit ratio for row cache is around 20%, so according to datastax, might
>>>> just turn the row cache altogether.
>>>> Anyway, for restart, it took about 2 minutes to load the row cache
>>>>
>>>>  INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108)
>>>> reading saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
>>>>  INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451)
>>>> completed loading (102801 ms; 21125 keys) row cache for XXX.f2
>>>>
>>>> Just for comparison, our key is long, the disk usage for row cache is
>>>> 253K. (it only stores key when row cache is saved to disk, so 253KB/ 8bytes
>>>> = 31625 number of keys). It's about right...
>>>> So for 15MB, there could be a lot of "narrow" rows. (if the key is
>>>> Long, could be more than 1M rows)
>>>>
>>>> Thanks.
>>>> -Wei
>>>>   ------------------------------
>>>> *From:* Edward Capriolo <edlinuxguru@gmail.com>
>>>> *To:* user@cassandra.apache.org
>>>> *Sent:* Tuesday, November 13, 2012 11:13 PM
>>>> *Subject:* Re: unable to read saved rowcache from disk
>>>>
>>>> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>>>>
>>>> A negative side-effect of a large row-cache is start-up time. The
>>>> periodic saving of the row cache information only saves the keys that
>>>> are cached; the data has to be pre-fetched on start-up. On a large
>>>> data set, this is probably going to be seek-bound and the time it
>>>> takes to warm up the row cache will be linear with respect to the row
>>>> cache size (assuming sufficiently large amounts of data that the seek
>>>> bound I/O is not subject to optimization by disks)
>>>>
>>>> Assuming a row cache 15MB and the average row is 300 bytes, that could
>>>> be 50,000 entries. 4 hours seems like a long time to read back 50K
>>>> entries. Unless the source table was very large and you can only do a
>>>> small number / reads/sec.
>>>>
>>>> On Tue, Nov 13, 2012 at 9:47 PM, Manu Zhang <owenzhang1990@gmail.com>
>>>> wrote:
>>>> > "incorrect"... what do you mean? I think it's only 15MB, which is not
>>>> big.
>>>> >
>>>> >
>>>> > On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo <
>>>> edlinuxguru@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Yes the row cache "could be" incorrect so on startup cassandra
>>>> verify they
>>>> >> saved row cache by re reading. It takes a long time so do not save
a
>>>> big row
>>>> >> cache.
>>>> >>
>>>> >>
>>>> >> On Tuesday, November 13, 2012, Manu Zhang <owenzhang1990@gmail.com>
>>>> wrote:
>>>> >> > I have a rowcache provieded by SerializingCacheProvider.
>>>> >> > The data that has been read into it is about 500MB, as claimed
by
>>>> >> > jconsole. After saving cache, it is around 15MB on disk. Hence,
I
>>>> suppose
>>>> >> > the size from jconsole is before serializing.
>>>> >> > Now while restarting Cassandra, it's unable to read saved rowcache
>>>> back.
>>>> >> > By "unable", I mean around 4 hours and I have to abort it and
>>>> remove cache
>>>> >> > so as not to suspend other tasks.
>>>> >> > Since the data aren't huge, why Cassandra can't read it back?
>>>> >> > My Cassandra is 1.2.0-beta2.
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message