hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-674) memcache size unreliable
Date Wed, 02 Jul 2008 23:46:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610088#action_12610088

Jim Kellerman commented on HBASE-674:

There are a number of issues here:
- multiple inserts or deletes for the same row/colum/timestamp are counted and can inflate
the memcache size some. This may not be a big issue because it is unlikely that someone is
using the same row/column/timestamp especially if they do not specify a timestamp for puts
or deletes.
- because of the inaccuracies of the above, subtracting the actual number of flushed bytes
from the memcache size leads to the potential of the memcache size growing over time if fewer
bytes are flushed than what HRegion thinks is is the memcache. What we really need to do is
keep track of both updates and memcache size, so that during a flush, we accumulate the size
of updates that are taken after the snapshot. When the flush is completed, we can set the
size of the memcache to the number of bytes submitted as updates during the flush.
- why the memcache size seems to be going negative more frequently recently is somewhat of
a mystery. It is pretty easy to understand why we might flush less than what we think is in
the cache, but how would we flush more than what we think is in the cache.
- Finally I don't particularly like the finished memcache flush message in HRegion. It reports
what it thinks is the current memcache size after the flush, but doesn't say that. It would
lead the casual observer to think that the size reported by HRegion after the flush is the
number of bytes flushed from the cache.

> memcache size unreliable
> ------------------------
>                 Key: HBASE-674
>                 URL: https://issues.apache.org/jira/browse/HBASE-674
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.1.2
>            Reporter: stack
>             Fix For: 0.2.0
>         Attachments: 674-v2.patch, 674.patch
> Multiple updates against same row/column/ts will be seen as increments to cache size
on insert but when we then play the memcache at flush time, we'll only see the most recent
entry and decrement the memcache size by whatever its size; memcache will be off.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message