Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 10 Feb 2014 11:56:19 +0000 (UTC)
From: "Ding Yuan (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12692735.1391369935485.16560.1392033379953@arcas>
In-Reply-To: <JIRA.12692735.1391369935485@arcas>
References: <JIRA.12692735.1391369935485@arcas>
Subject: [jira] [Commented] (HBASE-10452) Potential bugs in exception
 handlers
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896424#comment-13896424 ] 

Ding Yuan commented on HBASE-10452:
-----------------------------------

Thanks for the comments! Attached a new patch to address them. As for the possible integer overflow error from TimeRange: an IOException instead of RuntimeException is now thrown so the upper levels will deal with it. Let me know if this is fine.

> Potential bugs in exception handlers
> ------------------------------------
>
>                 Key: HBASE-10452
>                 URL: https://issues.apache.org/jira/browse/HBASE-10452
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, master, regionserver, util
>    Affects Versions: 0.96.1
>            Reporter: Ding Yuan
>         Attachments: HBase-10452-trunk-v2.patch, HBase-10452-trunk-v3.patch, HBase-10452-trunk.patch
>
>
> Hi HBase developers,
> We are a group of researchers on software reliability. Recently we did a study and found that majority of the most severe failures in HBase are caused by bugs in exception handling logic -- that it is hard to anticipate all the possible real-world error scenarios. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe real-world failures. I am reporting some of the results here. Any feedback is much appreciated!
> Ding
> =========================
> Case 1:
>   Line: 134, File: "org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java"
> {noformat}
>   protected void releaseTableLock() {
>     if (this.tableLock != null) {
>       try {
>         this.tableLock.release();
>       } catch (IOException ex) {
>         LOG.warn("Could not release the table lock", ex);
>         //TODO: if we get here, and not abort RS, this lock will never be released
>       }
>     }
> {noformat}
> The lock is not released if the exception occurs, causing potential deadlock or starvation.
> Similar code pattern can be found at:
>   Line: 135, File: "org/apache/hadoop/hbase/regionserver/SplitRequest.java"
> ==========================================
> =========================
> Case 2:
>   Line: 252, File: "org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java"
> {noformat}
>     try {
>       Field fEnd = SequenceFile.Reader.class.getDeclaredField("end");
>       fEnd.setAccessible(true);
>       end = fEnd.getLong(this.reader);
>     } catch(Exception e) { /* reflection fail. keep going */ }
> {noformat}
> The caught Exception seems to be too general.
> While reflection-related errors might be harmless, the try block can throw
> other exceptions including "SecurityException", "IllegalAccessException", etc. Currently
> all those exceptions are ignored. Maybe
> the safe way is to ignore the specific reflection-related errors while logging and
> handling other types of unexpected exceptions.
> ==========================================
> =========================
> Case 3:
>   Line: 148, File: "org/apache/hadoop/hbase/HBaseConfiguration.java"
> {noformat}
>     try {
>       if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) {
>         isShowConf = true;
>       }
>     } catch (Exception e) {
>     }
> {noformat}
> Similar to the previous case, the exception handling is too general. While ClassNotFound error might be the normal case and ignored, Class.forName can also throw other exceptions (e.g., LinkageError) under some unexpected and rare error cases. If that happens, the error will be lost. So maybe change it to below:
> {noformat}
>     try {
>       if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) {
>         isShowConf = true;
>       }
>     } catch (LinkageError e) {
>       LOG.warn("..");
>       // handle linkage error
>     } catch (ExceptionInInitializerError e) {
>       LOG.warn("..");
>       // handle Initializer error
>     } catch (ClassNotFoundException e) {
>      LOG.debug("..");
>      // ignore
>     }
> {noformat}
> ==========================================
> =========================
> Case 4:
>   Line: 163, File: "org/apache/hadoop/hbase/client/Get.java"
> {noformat}
>   public Get setTimeStamp(long timestamp) {
>     try {
>       tr = new TimeRange(timestamp, timestamp+1);
>     } catch(IOException e) {
>       // Will never happen
>     }
>     return this;
>   }
> {noformat}
> Even if the IOException never happens right now, is it possible to happen in the future due to code change?
> At least there should be a log message. The current behavior is dangerous since if the exception ever happens
> in any unexpected scenario, it will be silently swallowed.
> Similar code pattern can be found at:
>   Line: 300, File: "org/apache/hadoop/hbase/client/Scan.java"
> ==========================================
> =========================
> Case 5:
>   Line: 207, File: "org/apache/hadoop/hbase/util/JVM.java"
> {noformat}
>    if (input != null){
>         try {
>           input.close();
>         } catch (IOException ignored) {
>         }
>       }
> {noformat}
> Any exception encountered in close is completely ignored, not even logged.
> In particular, the same exception scenario was handled differently in other methods in the same file:
> Line: 154, same file
> {noformat}
>        if (in != null){
>          try {
>            in.close();
>          } catch (IOException e) {
>            LOG.warn("Not able to close the InputStream", e);
>          }
>        }
> {noformat}
> Line: 248, same file
> {noformat}
>       if (in != null){
>         try {
>           in.close();
>         } catch (IOException e) {
>           LOG.warn("Not able to close the InputStream", e);
>         }
>       }
> {noformat}
> ==========================================
> =========================
> Case 6: empty handler for exception: java.io.IOException
>   Line: 312, File: "org/apache/hadoop/hbase/rest/RowResource.java"
> {noformat}
>     } finally {
>       if (table != null) try {
>         table.close();
>       } catch (IOException ioe) { }
>     }
> {noformat}
> IOException is completely ignored. This behavior is inconsistent with the same
> code snippet at line 249 in the same file, where the IOExceptions was logged:
> {noformat}
>    } finally {
>       if (table != null) try {
>         table.close();
>       } catch (IOException ioe) {
>         LOG.debug("Exception received while closing the table", ioe);
>       }
>     }
> {noformat}
> ==========================================
> =========================
> Case 7:
>   Line: 95, File: "org/apache/hadoop/hbase/master/handler/EnableTableHandler.java"
> {noformat}
>         try {
>           this.assignmentManager.getZKTable().removeEnablingTable(tableName, true);
>         } catch (KeeperException e) {
>           // TODO : Use HBCK to clear such nodes
>           LOG.warn("Failed to delete the ENABLING node for the table " + tableName
>               + ".  The table will remain unusable. Run HBCK to manually fix the problem.");
>         }
> {noformat}
> The log message in the exception handler and the comment seem to suggest that such nodes should be cleared using HBCK.
> ==========================================
> =========================
> Case 8:
>   Line: 463, File: "org/apache/hadoop/hbase/client/ClientScanner.java"
> {noformat}
>         try {
>           this.caller.callWithRetries(callable);
>         } catch (IOException e) {
>           // We used to catch this error, interpret, and rethrow. However, we
>           // have since decided that it's not nice for a scanner's close to
>           // throw exceptions. Chances are it was just an UnknownScanner
>           // exception due to lease time out.
>         }
> {noformat}
> Currently the handler is empty because it may be caused by "just an UnknownScanner exception".
> But what if it has other causes? Maybe the catch block can differentiate the exception into
> different causes, ignoring the ones caused by UnknownScanner while handle others differently?
> ==========================================


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)