hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon" <t...@cloudera.com>
Subject Re: Review Request: hbase-2437 log split code revamp by Cosmin
Date Wed, 26 May 2010 20:34:48 GMT


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1210
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1210>
> >
> >     having looked at this function a few times and been very confused, I'd love
to see a javadoc that explains the entire process - eg what the "batching" does, what the
output looks like, etc
> 
> Cosmin Lehene wrote:
>     the original version of this function determined me to start refactoring in the first
place. I'll add the description but if it's still confusing it might need more work.

Oh, it's much much improved! Thanks! I still think a "high level overview" would be good to
see.


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1237
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1237>
> >
> >     this conf is incorrectly named, right? it's not a number of threads, but rather
a number of logs to read in in each round?
> 
> Cosmin Lehene wrote:
>     perhaps hbase.regionserver.hlog.splitlog.batch.size?

why .regionserver.? I'd say just hbase.hlog.split.batch.size or something


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1238
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1238>
> >
> >     this conf is very vaguely named
> 
> Cosmin Lehene wrote:
>     I know and tried to get a better name when created it. Can you suggest something
better? I can't figure a short descriptive enough name
>

if this applies only to hlog splitting, maybe hbase.hlog.split.skip.errors


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1525
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1525>
> >
> >     typo, and we should fix this before committing. we can use the org.apache.hadoop.util.VersionInfo
class for this, or use reflection to look for the "hflush()" function
> 
> Cosmin Lehene wrote:
>     I'd rather have these differences dealt at the lowest level (writers) and abstracted
than spread across code.
>     What do you think?

Which do you mean by "writers" here? I'd support factoring this function out into an HdfsUtil
class somewhere.


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1745
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1745>
> >
> >     yes, I think so. The RS could have crashed right after opening but before writing
any data, and if the master failed to recover that, then we'd never recover that region. I
say ignore with a WARN
> 
> Cosmin Lehene wrote:
>     more aspects here:
>     I think the reported size will be >0 after recover, even if file has no records.
I was asking if we should add logic to check if it's the last log. 
>     EOF for non zero length, non zero records file means file is corrupted.

I agree if it has no records (I think - do we syncfs after writing the sequencefile header?).
But there's the case where inside SequenceFile we call create, but never actually write any
bytes. This is still worth recovering.

In general I think a corrupt tail means we should drop that record (incomplete written record)
but not shut down. This is only true if it's the tail record, though.


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1768
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1768>
> >
> >     I think we need to handle EOF specially here too, though OK to leave this for
another JIRA. IIRC one of the FB guys opened this already
> 
> Cosmin Lehene wrote:
>     what's the other JIRA? see my above comments.

Can't find it now... does my above comment make sense?


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1811
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1811>
> >
> >     no point to call .sync() here, it just wastes a bunch of IO to write "sync markers"
which we don't make any real use of.
> 
> Cosmin Lehene wrote:
>     sync() used to call syncFs(). It looks like HBASE-2544 changed things a bit, but
it doesn't only add the SequenceFile sync marker.
>     
>     I added this after I've seen inconsistent results when running splitLog on bigger
hlogs. Try copying a log from the cluster locally and run splitLog from the command line a
few times without flushing it after each append. I used to get inconsistent results between
runs and calling sync fixed it.
>     
>     There's this "//TODO: test the split of a large (lots of regions > 500 file).
In my tests it seems without hflush"  in the TestHLogSplit. 
>     
>     We could do some testing to figure out why would log entries be lost when running
locally.
>     
>     What would be a better way to flush the writer?

This seems really voodoo.. if anything we're probably masking a real bug by doing this. Can
you write a unit test which shows this problem (even if it takes 30 minutes to run, would
be good to have in our arsenal)


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java, line
197
> > <http://review.hbase.org/r/74/diff/1/?file=509#file509line197>
> >
> >     this is the case I take issue with - trailing garbage should still proceed (with
warnings) even if we've told it not to skip errors.
> 
> Cosmin Lehene wrote:
>     I'd like to be able to investigate the trailing garbage. I don't think this should
ever happen (do you see any scenarios?). If it did we might lose data. We used to fix NameNode
edits for fsImage by adding a missing byte to a corrupted entry.
>     
>     I'd like to reflect more on this, maybe see other opinions.

The case where this happens is if you crash in the middle of appending a long edit. Consider
the case where a single edit might have 1MB of data (large rows). We can easily crash in the
middle of transferring it, before we call sync on the edit. In this case, the client never
received an "ack" for the write, so we can feel free to throw it away (this isn't data loss,
it's correct operation).


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/74/#review40
-----------------------------------------------------------


On 2010-05-21 12:11:59, stack wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/74/
> -----------------------------------------------------------
> 
> (Updated 2010-05-21 12:11:59)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This is a version of Cosmin's patch that applies to trunk doctored to work w/ hadoop
0.20.
> 
> 
> This addresses bug hbase-2437.
>     http://issues.apache.org/jira/browse/hbase-2437
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e479eae 
>   src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java PRE-CREATION

> 
> Diff: http://review.hbase.org/r/74/diff
> 
> 
> Testing
> -------
> 
> All his new tests past.
> 
> 
> Thanks,
> 
> stack
> 
>


Mime
View raw message