hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Data upgrade from 0.89x to 0.90.0.
Date Fri, 11 Feb 2011 07:23:35 GMT
we only major compact a region ever 24 hours, therefore if it was JUST
compacted within the last 24 hours we skip it.

this is how it used to work, and how it should still work, not really
looking at code right now, busy elsewhere :-)

-ryan

On Thu, Feb 10, 2011 at 11:17 PM, James Kennedy
<james.kennedy@troove.net> wrote:
> Can you define 'come due'?
>
> The NPE occurs at the first isMajorCompaction() test in the main loop of MajorCompactionChecker.
> That cycle is executed every 2.78 hours.
> Yet I know that I've kept healthy QA test data up and running for much longer than that.
>
>
> James Kennedy
> Project Manager
> Troove Inc.
>
> On 2011-02-10, at 10:46 PM, Ryan Rawson wrote:
>
>> I am speaking off the hip here, but the major compaction algorithm
>> attempts to keep the number of major compactions to a minimum by
>> checking the timestamp of the file. So it's possible that the other
>> regions just 'didnt come due' yet.
>>
>> -ryan
>>
>> On Thu, Feb 10, 2011 at 10:42 PM, James Kennedy
>> <james.kennedy@troove.net> wrote:
>>> I've tested HBase 0.90 + HBase-trx 0.90.0 and i've run it over old data from
0.89x using a variety of seeded unit test/QA data and cluster configurations.
>>>
>>> But when it came time to upgrade some production data I got snagged on HBASE-3524.
The gist of it is in Ryan's last points:
>>>
>>> * compaction is "optional", meaning if it fails no data is lost, so you
>>> should probably be fine.
>>>
>>> * Older versions of the code did not write out time tracker data and
>>> that is why your older files were giving you NPEs.
>>>
>>> Makes sense.  But why did I not encounter this with my initial data upgrades
on very similar data pkgs?
>>>
>>> So I applied Ryan's patch, which simply assigns a default value (Long.MIN_VALUE)
when a StoreFile lacks a timeRangeTracker and I "fixed" the data by forcing major compactions
on the regions affected.  Preliminary poking has not shown any instability in the data since.
>>>
>>> But I confess that I just don't have the time right now to really dig into the
code and validate that there are no more gotchya's or data corruption that could have resulted.
>>>
>>> I guess the questions that I have for the team are:
>>>
>>> * What state would 9 out of 50 tables be in to miss the new 0.90.0 timeRangeTracker
injection before the first major compaction check?
>>> * Where else is the new TimeRangeTracker used?  Could a StoreFile with a null
timeRangeTracker have corrupted the data in other subtler ways?
>>> * What other upgrade-related data changes might not have completed elsewhere?
>>>
>>> Thanks,
>>>
>>> James Kennedy
>>> Project Manage
>>> Troove Inc.
>>>
>>>
>
>

Mime
View raw message