hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HFileOutputFormat2 hardcodes default FileOutputCommitter
Date Tue, 26 Sep 2017 13:07:47 GMT
Hbase doesn't use pull request.
Can you open a JIRA and attach patch there ?
-------- Original message --------From: ShaoFeng Shi <shaofengshi@apache.org> Date:
9/26/17  4:00 AM  (GMT-08:00) To: dev@hbase.apache.org Subject: Re: HFileOutputFormat2 hardcodes
default FileOutputCommitter 
Here is the pull request:


2017-09-26 17:16 GMT+08:00 ShaoFeng Shi <shaofengshi@apache.org>:

> Hello gentlemen,
> This is Shaofeng Shi from Apache Kylin community, we use HBase as the
> storage engine, and we use MR job to generate HFile before bulk load. We
> received user reporting that, if configured to use S3 as the output
> location for HFile, the files were generated in "_temporary" folder and
> won't be committed to the target path. This caused no data be loaded
> finally. And we can reproduce this problem easily. The original reporting
> is in [1].
> Kylin uses HBase's HFileOutputFormat2.java to configure the MR job. After
> some investigation, I found this class always uses the default
> "FileOutputCommitter", see [2], regardless of the job's configuration; so
> it always writing to "_temporary" folder. Since AWS EMR configured to use
> DirectOutputCommitter for S3, then this problem occurs: Hadoop expects to
> see the file directly under output path, while the RecordWriter generates
> them in "_temporary" folder.
> Did you get such reporting before? I had a temporary fix in my fork now.
> Just wondering how you think about it; if oaky I would report a JIRA.
> Thanks!
> [1] https://issues.apache.org/jira/browse/KYLIN-2788
> [2] https://github.com/apache/hbase/blob/master/hbase-
> mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/
> HFileOutputFormat2.java#L193
> --
> Best regards,
> Shaofeng Shi 史少锋

Best regards,

Shaofeng Shi 史少锋
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message