hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@duboce.net>
Subject Re: svn commit: r707247 - in /hadoop/hbase/trunk: ./ conf/ src/java/org/apache/hadoop/hbase/regionserver/
Date Thu, 23 Oct 2008 03:52:55 GMT
How does new feature effect hbase throughput?  Does it make it slower?  
Faster?  Any measurement done?  Do appends work for hbase?  Did you try 
crashing an hbase server and see if it comes back up with only a few 
edits lost?

Other comments in-line below.

jimk@apache.org wrote:
>    <property>
>      <name>hbase.regionserver.maxlogentries</name>
> -    <value>30000</value>
> +    <value>100000</value>
>      <description>Rotate the HRegion HLogs when count of entries exceeds this
> -    value.  Default: 30,000.  Value is checked by a thread that runs every
> +    value.  Default: 100,000.  Value is checked by a thread that runs every
>      hbase.server.thread.wakefrequency.
>      </description>
>    </property>
>   
I was thinking that the size of the log file is a better measure of when 
to rotate given that there can be a wide divergence in WAL log file size 
but maybe not given that flush sequenceids are pegged against a 
particular edit.
...

> Modified: hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/Flusher.java
> URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/Flusher.java?rev=707247&r1=707246&r2=707247&view=diff
> ==============================================================================
> --- hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/Flusher.java (original)
> +++ hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/Flusher.java Wed
Oct 22 19:30:35 2008
> @@ -25,7 +25,6 @@
>  import java.util.concurrent.locks.ReentrantLock
I like how Flusher has had a bunch of code purged.


> Modified: hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HLog.java
> URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HLog.java?rev=707247&r1=707246&r2=707247&view=diff
> ==============================================================================
> --- hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HLog.java (original)
> +++ hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HLog.java Wed Oct
22 19:30:35 2008
> @@ -36,6 +36,7 @@
>  import org.apache.hadoop.fs.FileStatus;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.Syncable;
>  import org.apache.hadoop.hbase.HBaseConfiguration;
>  import org.apache.hadoop.hbase.HConstants;
>  import org.apache.hadoop.hbase.HRegionInfo;
> @@ -82,16 +83,8 @@
>   * rolling is not. To prevent log rolling taking place during this period, a
>   * separate reentrant lock is used.
>   *
> - * <p>
> - * TODO: Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in
> - * HDFS is currently flawed. HBase writes edits to logs and to a memcache. The
> - * 'atomic' write to the log is meant to serve as insurance against abnormal
> - * RegionServer exit: on startup, the log is rerun to reconstruct an HRegion's
> - * last wholesome state. But files in HDFS do not 'exist' until they are cleanly
> - * closed -- something that will not happen if RegionServer exits without
> - * running its 'close'.
>   */
> -public class HLog implements HConstants {
> +public class HLog extends Thread implements HConstants, Syncable {
>    private static final Log LOG = LogFactory.getLog(HLog.class);
>    private static final String HLOG_DATFILE = "hlog.dat.";
>    static final byte [] METACOLUMN = Bytes.toBytes("METACOLUMN:");
> @@ -100,8 +93,12 @@
>    final Path dir;
>    final Configuration conf;
>    final LogRollListener listener;
> -  final long threadWakeFrequency;
>    private final int maxlogentries;
> +  private final long optionalFlushInterval;
> +  private final int flushlogentries;
> +  private volatile int unflushedEntries = 0;
> +  private volatile long lastLogFlushTime;
> +  final long threadWakeFrequency;
>  
>    /*
>     * Current log file.
> @@ -153,13 +150,22 @@
>     */
>    public HLog(final FileSystem fs, final Path dir, final Configuration conf,
>        final LogRollListener listener) throws IOException {
> +    
> +    super();
> +    
>      this.fs = fs;
>      this.dir = dir;
>      this.conf = conf;
>      this.listener = listener;
> -    this.threadWakeFrequency = conf.getLong(THREAD_WAKE_FREQUENCY, 10 * 1000);
> +    this.setName(this.getClass().getSimpleName());
>   

We have convention naming threads.  Its name of server -- 
master/regionserver host and port -- followed by the what thread does 
(This used to be hlog?  Or log?).  Makes it easy sorting them out in 
thread dump.

>      this.maxlogentries =
> -      conf.getInt("hbase.regionserver.maxlogentries", 30 * 1000);
> +      conf.getInt("hbase.regionserver.maxlogentries", 100000);
> +    this.flushlogentries =
> +      conf.getInt("hbase.regionserver.flushlogentries", 100);
> +    this.optionalFlushInterval =
> +      conf.getLong("hbase.regionserver.optionallogflushinterval", 10 * 1000);
> +    this.threadWakeFrequency = conf.getLong(THREAD_WAKE_FREQUENCY, 10 * 1000);
> +    this.lastLogFlushTime = System.currentTimeMillis();
>      if (fs.exists(dir)) {
>        throw new IOException("Target HLog directory already exists: " + dir);
>      }
> @@ -168,7 +174,7 @@
>    }
>  
>    /*
> -   * Accessor for tests.
> +   * Accessor for tests. Not a part of the public API.
>     * @return Current state of the monotonically increasing file id.
>     */
>    public long getFilenum() {
> @@ -313,6 +319,7 @@
>            }
>          }
>          this.numEntries = 0;
> +        updateLock.notifyAll();
>        }
>      } finally {
>        this.cacheFlushLock.unlock();
> @@ -354,11 +361,15 @@
>      cacheFlushLock.lock();
>      try {
>        synchronized (updateLock) {
> +        this.closed = true;
> +        if (this.isAlive()) {
> +          this.interrupt();
> +        }
>          if (LOG.isDebugEnabled()) {
>            LOG.debug("closing log writer in " + this.dir.toString());
>          }
>          this.writer.close();
> -        this.closed = true;
> +        updateLock.notifyAll();
>        }
>      } finally {
>        cacheFlushLock.unlock();
> @@ -415,11 +426,40 @@
>  
>          this.numEntries++;
>        }
> +      updateLock.notifyAll();
>      }
>      if (this.numEntries > this.maxlogentries) {
>          requestLogRoll();
>      }
>    }
> +  
> +  /** {@inheritDoc} */
> +  @Override
> +  public void run() {
> +    while (!this.closed) {
> +      synchronized (updateLock) {
> +        if (((System.currentTimeMillis() - this.optionalFlushInterval) >
> +              this.lastLogFlushTime) && this.unflushedEntries > 0) {
>   

Should this Log thread inherit from Chore?

> ewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?rev=707247&r1=707246&r2=707247&view=diff
> ==============================================================================
> --- hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
(original)
> +++ hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Wed Oct 22 19:30:35 2008
> @@ -561,7 +561,9 @@
>          "running at " + this.serverInfo.getServerAddress().toString() +
>          " because logdir " + logdir.toString() + " exists");
>      }
> -    return new HLog(fs, logdir, conf, logRoller);
> +    HLog newlog = new HLog(fs, logdir, conf, logRoller);
> +    newlog.start();
> +    return newlog;
>   

There is a place in HRS where all service threads are started.   Now 
HLog is a Thread, should it be moved in there? Into startServiceThreads?

St.Ack

Mime
View raw message