hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log
Date Sun, 22 Nov 2009 18:57:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781189#action_12781189

Lars George commented on HBASE-1994:

I read the BigTable paper again and found that my "approach" is shunted there:

One approach would be for each new tablet server to read this full commit log file and apply
just the entries needed for the tablets it needs to recover. However, under such a scheme,
if 100 machines were each assigned a single tablet from a failed tablet server, then the log
file would be read 100 times (once by each server).

Makes sense, if a RS hosted 100 regions (which is often a rather low figure in what is reported
by users) those will be spread across the cluster and could result in N RS's reading the file
trying to find what is where. So we could keep the threaded split we have or think about simply
do the sort as suggest by the BigTable paper and then have simple range reads on each RS.

We avoid duplicating log reads by first sorting the commit log entries in order of the keys
(table, row name, log sequence number). In the sorted output, all mutations for a particular
tablet are contiguous and can therefore be read efficiently with one disk seek followed by
a sequential read. To parallelize the sorting, we partition the log file into 64 MB segments,
and sort each segment in parallel on different tablet servers. This sorting process is coordinated
by the master and is initiated when a tablet server indicates that it needs to recover mutations
from some commit log file.{quote}

What did you discuss during the hackathon?

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log
already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log
in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding
the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting
1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting
hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue
for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795
entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got
795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog
file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing
file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while
writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file
splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message