hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "JH Lin" <match20...@163.com>
Subject is it unnecessary for 'flushlock' in Store?
Date Mon, 11 Dec 2017 15:57:35 GMT
hi all, I read some code recently about the flow of flush-table which entried in HBaseAdmin#flush(tableOrRegion).
i found that at least four locks in this flow(0.94):
-region lock
-region updatelock
-hlog cacheFlushLock
-store flushlock
(awesome locks usages)
then i have certain questions in it:
A.why not flush stores concurrently?
-this has been fixed with hbase-6466 related by hbase-6980,well done!
B.why not flush regions concurrently? 
-as in some scenarios only one family in a table, the fixed question A will not gain any performance
 so i dive in to master trunk and found that the flow has changed a lot :it has been delivered
to master to do it(but i am not care the details in it in fact)
publicvoidflush(final TableName tableName)throws IOException {
1164if(isTableDisabled(tableName)) {
1165 LOG.info("Table is disabled: "+ tableName.getNameAsString());
1168execProcedure("flush-table-proc", tableName.getNameAsString(),new HashMap<>());
publicbyte[]execProcedureWithReturn(String signature, String instance, Map<String, String>
2744throws IOException {
2745 ProcedureDescription desc = ProtobufUtil.buildProcedureDescription(signature, instance,
2746final ExecProcedureRequest request =
2747 ExecProcedureRequest.newBuilder().setProcedure(desc).build();
2748// run the procedure on the master
2749 ExecProcedureResponse response =executeCallable(
2750new MasterCallable<ExecProcedureResponse>(getConnection(),getRpcControllerFactory())
2752protected ExecProcedureResponse rpcCall()throws Exception {
2753return master.execProcedureWithRet(getRpcController(), request);
2757return response.hasReturnData() ? response.getReturnData().toByteArray() : null;

C.is it the 'flushlock' unnecessary in Store? ( **this thread FOCUS**)
  since this lock is used in only one place(no other lock racers) :Store#internalFlushCache(),i.e..after
grabbing scanner on snapshot and before finishing writing to hfile.
  and i saw this 'flushlock' retains in master trunk:
public List<Path>flushSnapshot(MemStoreSnapshot snapshot,long cacheFlushId,
48 MonitoredTask status, ThroughputController throughputController,
49 FlushLifeCycleTracker tracker)throws IOException {
50 ArrayList<Path> result =new ArrayList<>();
51int cellsCount = snapshot.getCellsCount();
52if(cellsCount ==0)return result;// don't flush if there are no entries
54// Use a store scanner to find which rows to flush.
55long smallestReadPoint = store.getSmallestReadPoint();
56 InternalScanner scanner =createScanner(snapshot.getScanners(), smallestReadPoint, tracker);
57 StoreFileWriter writer;
59// TODO: We can fail in the below block before we complete adding this flush to
60// list of store files. Add cleanup of anything put on filesystem if we fail.
61synchronized(flushLock) {
62 status.setStatus("Flushing "+ store +": creating writer");
63// Write the map out to the disk
64 writer = store.createWriterInTmp(cellsCount, store.getColumnFamilyDescriptor().getCompressionType(),
65/* isCompaction = */ false,
66/* includeMVCCReadpoint = */ true,
67/* includesTags = */ snapshot.isTagsPresent(),
68/* shouldDropBehind = */ false);
69 IOException e = null;
71performFlush(scanner, writer, smallestReadPoint, throughputController);
72}catch(IOException ioe) {
73 e = ioe;
74// throw the exception out
75throw ioe;
77if(e != null) {
78 writer.close();
80finalizeWriter(writer, cacheFlushId, status);
85 scanner.close();
87 LOG.info("Flushed, sequenceid="+ cacheFlushId +", memsize="
88+ StringUtils.TraditionalBinaryPrefix.long2String(snapshot.getDataSize(),"",1) +
89", hasBloomFilter="+ writer.hasGeneralBloom() +
90", into tmp file "+ writer.getPath());
91 result.add(writer.getPath());
92return result;

  in fact, the 'cacheFlushLock' will do this duty of it.so i think it's unnecessary or only
for tests as i found a TestStore will call it :
internalFlushCache(SortedSet<KeyValue>, long, TimeRangeTracker, AtomicLong, MonitoredTask)
: Path - org.apache.hadoop.hbase.regionserver.Store
flushCache(long, SortedSet<KeyValue>, TimeRangeTracker, AtomicLong, MonitoredTask) :
Path - org.apache.hadoop.hbase.regionserver.Store
flushCache(MonitoredTask) : void - org.apache.hadoop.hbase.regionserver.Store.StoreFlusherImpl
flushStore(Store, long) : void - org.apache.hadoop.hbase.regionserver.TestStore
internalFlushcache(HLog, long, MonitoredTask) : boolean - org.apache.hadoop.hbase.regionserver.HRegion
  that means in normal flow only HRegion#intervalFlushcache() will call it but TestStore is
a inserted flow(ie.unpossible flow)
  >>I just wonder is it redundant in fact though i think even if the removal of this
lock will not significantly improve  performance .<<
any input is appreciated ,thanks
--JH Lin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message