hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: [jira] Commented: (HBASE-728) Supporting for HLog appends
Date Fri, 24 Oct 2008 20:00:16 GMT
Replies to yours inline.

Jim Kellerman (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/HBASE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642525#action_12642525
> Jim Kellerman commented on HBASE-728:
> -------------------------------------
>> From: Michael Stack [mailto:stack@duboce.net]
>> Sent: Wednesday, October 22, 2008 8:53 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: svn commit: r707247 - in /hadoop/hbase/trunk: ./ conf/ 
>> src/java/org/apache/hadoop/hbase/regionserver/
>> How does new feature effect hbase throughput?  Does it make it slower?
>> Faster?  Any measurement done?
> I measured PerformanceEvaluation random write 1 with one region server
> before and after the appends patch.
> I would say that throughput is either the same or a little faster.
> I only ran one run on the code before appends, and this test completed
> in 2 minutes 31 seconds
> In fixing up a couple of bugs in appends, I have run this test 5 times.
> The slowest was 2 minutes 33 seconds, but the other times were all faster:
> 2:24, 2:20, 2:21 and 2:21.
Would suggest running with much higher rates to see if it breaks; 
suggest many clients writing into the one regionserver.

>> I was thinking that the size of the log file is a better measure of 
>> when to rotate given that there can be a wide divergence in WAL log 
>> file size but maybe not given that flush sequenceids are pegged 
>> against a particular edit.
> This could be done either way and I have no preference. With the default
> settings, running PerformanceEvaluation random write 1 with one region
> server, the HLogs were about 160MB. It might be nice to use the file size
> so we can get closer to a multiple of HDFS block size. Doing so, might
> be better in the general case, which is any application except
> PerformanceEvaluation. In some cases, we might put more updates into a
> log (if keys and values are small), and in others we might put fewer
> (when keys and values are large). Being close to a multiple of HDFS block
> size is probably a good thing, so I am kind of leaning toward log size
> instead of number of updates. What do others think?
I think its better to have the roll based off edit counts rather than 
size, at least at first.  While there may be some mild performance 
benefit to our coming close to blocksize, we'll never hit it spot on and 
logs are let go based on whether they contain edits that are older than 
a sequenceid -- i.e. a particular edit, not an edits size.

>> We have convention naming threads.  Its name of server -- 
>> master/regionserver host and port -- followed by the what thread does 
>> (This used to be hlog?  Or log?).  Makes it easy sorting them out in 
>> thread dump.
> Currently the thread is named HLog. Would it be preferable to name it
> <servername>.Hlog ? Log entries only appear in one region server's log.
> Does it matter?
Minor, if multiple regionservers in the one JVM, as in unit tests, it'll 
help.  But I'm more about this new thread name aligning with how all 
other threads in hbase are named.

>> Should this Log thread inherit from Chore?
> Currently only the root, meta scanners and CleanOldTransactions (in
> regionserver.transactional) extend chore. This change was made a while
> back, but I can't remember why. Should all the threads in HRS and HMaster
> extend Chore? We would need to add the "interrupt politely" method,
> but I can't think of a reason we shouldn't do this (as a separate Jira).

Agreed. Separate, low-priority JIRA.
>> There is a place in HRS where all service threads are started.   Now
>> HLog is a Thread, should it be moved in there? Into startServiceThreads?
> Currently, the HLog thread is started by HRS.setupHLog. Since it is called
> from multiple locations, moving the thread start to startServiceThreads,
> would involve extra synchronization. 
It looks like its called from two places, on init and when 
not have two HLog Threads running?
> However I note that the HLog thread is not set to be a daemon thread, which
> should probably be fixed.



View raw message