hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores
Date Wed, 17 Oct 2012 03:36:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477575#comment-13477575
] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

Ramakrishna,

Thanks for your email.

#1. It is not clear why we even write a META entry for flushes...

{code}
private WALEdit completeCacheFlushLogEdit() {
    KeyValue kv = new KeyValue(METAROW, METAFAMILY, null,
      System.currentTimeMillis(), COMPLETE_CACHE_FLUSH);
    WALEdit e = new WALEdit();
    e.add(kv);
    return e;
  }
{code}

The replayRecoveredEdits() logic skips over these entries anyway. And the only reference I
see for this special entry in HLog is in unit tests.

#2. Yes, currently there is a lot of comments (related to lastSeqWritten) before the function
HLog.java:startCacheFlush(), but the logic is not very clear to me. The changes were committed
as part of HBASE-3845. I think we should be able to simplify that logic. I think I see some
potential bugs there even it stands now-- will need to spend some more time looking at this,
and will write down an update here.

But bottom line, I still don't see any good fundamental reason we need to hold this lock for
the duration of the entire flush (even given the lastSeqWritten map logic).

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck.
With a single flusher thread, we are basically not setup to take advantage of the aggregate
throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region
server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple
commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports),
we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message