hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qiang Tian (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
Date Tue, 14 Oct 2014 08:04:34 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Qiang Tian updated HBASE-11368:
    Attachment: hbase-11368-0.98.5.patch

I forgot StoreScanner is per CF..earlier analysis is wrong:
After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, notifyChangedReadersObservers
is called to reset the StoreScanner#heap, so checkReseek->resetScannerStack will be triggered
in next scan/read to recreate store scanners based on new storefiles.

so we could introduce a new region level rwlock multiCFLock, HRegion#bulkLoadHFiles acquires
the writelock before multi-CF HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack
acquires the readlock. this way the scanners are recreated after all CFs' store files are

instead, the new lock should put at regionScanner layer.  see the patch attached.

the "mvn test" and "TestHRegionServerBulkLoad"(large test for atomic bulkload test) passed,
still need to run large tests and performance test(any suggestions for it? YCSB?).

the lock can be further limited to a smaller scope by split HStore#bulkLoadHFile into 2 parts:1)
rename the bulkload files and put new files into store files list 2) notifyChangedReadersObservers.
only #2 needs the lock. 
if HDFS file rename is fast, the split may not be needed.

> Multi-column family BulkLoad fails if compactions go on too long
> ----------------------------------------------------------------
>                 Key: HBASE-11368
>                 URL: https://issues.apache.org/jira/browse/HBASE-11368
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Qiang Tian
>         Attachments: hbase-11368-0.98.5.patch
> Compactions take a read lock.  If a multi-column family region, before bulk loading,
we want to take a write lock on the region.  If the compaction takes too long, the bulk load
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882
as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need a full write
lock when multiple column families?  Can we fail more gracefully at least?

This message was sent by Atlassian JIRA

View raw message