hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Tablesplit.getLength returns 0
Date Mon, 03 Feb 2014 17:20:16 GMT
You can take a look at the following method in HBaseTestingUtility:

  public HTable createTable(byte[] tableName, byte[][] families,

      int numVersions, byte[] startKey, byte[] endKey, int numRegions)
throws IOException {

I saw you issue a git pull request - please generate patch based on trunk
and attach to JIRA. HBase source repo is currently in subversion.
In https://github.com/apache/hbase/pull/8/files , I don't seem to find the
new config parameter which turns this feature on/off.

Regards

On Mon, Feb 3, 2014 at 8:37 AM, Lukas Nalezenec <
lukas.nalezenec@firma.seznam.cz> wrote:

> I have done some changes - see https://issues.apache.org/
> jira/browse/HBASE-10413 for more discussion.
>
> I need help with unit test. Is there some simple unit test helper/utility
> i can use ?  I need to create table with some regions and then work with
> their sizes.  It should be local, there should be some level of abstraction.
>
> The code works well but there are outlayers - one map with 1.6G region and
> 250MB "Map output bytes" takes 1 hour (it should take few minutes). Do you
> have got some idea why this happens ?
>
> 2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb
> = 200
> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data
> buffer = 159383552/199229440
> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record
> buffer = 524288/655360
>
> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling
> map output: record full = true
> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart =
> 0; bufend = 118888312; bufvoid = 199229440
> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart =
> 0; kvend = 524288; length = 655360
> 2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.snappy]
> 2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished
> spill 0
> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling
> map output: record full = true
> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart =
> 118888312; bufend = 27185690; bufvoid = 199229433
> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart =
> 524288; kvend = 393215; length = 655360
> 2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished
> spill 1
> 2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting
> flush of map output
> 2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished
> spill 2
> 2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3
> sorted segments
> 2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor [.snappy]
>
>
> Lukas
>
>
>
> On 31.1.2014 16:35, Ted Yu wrote:
>
>> +  public void setLength(long length) {
>>
>> This method in TableSplit can be package private.
>>
>> +  final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);
>>
>>
>> Name of class is wrong.
>>
>>
>> +    makeFamilyFilter(families);
>>
>>
>> The return value is ignored.
>>
>>
>>
>> Can you make a patch for trunk and attach to JIRA ?
>>
>>
>> Thanks
>>
>>
>> On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
>> lukas.nalezenec@firma.seznam.cz> wrote:
>>
>>  Hi,
>>> I have written first draft: https://github.com/
>>> lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
>>> Can you please review it and let mi know it is feasible solution ?
>>> Lukas
>>>
>>>
>>> On 30.1.2014 18:14, Nick Dimiduk wrote:
>>>
>>>  Sounds good, I'll watch for your patch!
>>>>
>>>> On Thursday, January 30, 2014, Lukas Nalezenec <
>>>>
>>>> lukas.nalezenec@firma.seznam.cz> wrote:
>>>>
>>>>   I talked with guy who worked on this and he said our production issue
>>>> was
>>>>
>>>>> probably not directly caused by getLength() returning 0.
>>>>> Anyway, we are interested in fixing that, estimating length from files
>>>>> is
>>>>> good idea.
>>>>>
>>>>> Lukas
>>>>>
>>>>>    InputSplit.getLength() and RecordReader.getProgress() is important
>>>>> for
>>>>> the
>>>>>
>>>>>  MR framework to be able to show progress etc. It would be good to
>>>>>> return
>>>>>> raw data sizes in getLength() computed from region's total size of
>>>>>> store
>>>>>> files, and progress being calculated from scanner's amount of raw
data
>>>>>> seen.
>>>>>>
>>>>>>   Enis
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message