hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KSHITIJ GAUTAM (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-21211) Can't Read Partitions File - Partitions File deleted
Date Wed, 19 Sep 2018 22:35:00 GMT
KSHITIJ GAUTAM created HBASE-21211:
--------------------------------------

             Summary: Can't Read Partitions File - Partitions File deleted 
                 Key: HBASE-21211
                 URL: https://issues.apache.org/jira/browse/HBASE-21211
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.5.0, 1.6.0
         Environment: * HBase Version: 1.2.0-cdh5.11.1 (the line that deletes the file still
exists)
 * hadoop version
 * Hadoop 2.6.0-cdh5.11.1
 * Subversion http://github.com/cloudera/hadoop -r b581c269ca3610c603b6d7d1da0d14dfb6684aa3
 * From source with checksum c6cbc4f20a8a571dd7c9f743984da1
 * This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.11.1.jar
            Reporter: KSHITIJ GAUTAM
             Fix For: 1.5.0, 1.6.0
         Attachments: 0001-do-not-delete-the-partitions-file-if-the-session-is-.patch

Hi team, we have a MapReduce job that uses the bulkload option instead of direct puts to import
data e.g., 
{code:java}
HFileOutputFormat2.configureIncrementalLoad(job, table, locator);{code}
 However we have been running into a situation where partitions file is deleted by the termination
of the JVM process, where JVM process kicks off the MapReduce job but it's also waiting to
run the `configureIncrementalLoad` that executes the configurePartitioner. 

 

_Error: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)_

 

We think the line#827 of [HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827]
could be the root cause. 

 
{code:java}
fs.deleteOnExit(partitionsPath);{code}
 

We have created our custom HFileOutputFormat that doesn't delete the partitions file and have
fixed the problem for our cluster. We propose that a cleanup method could be created which
deletes the partitions file once all the mappers have finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message