hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Nalezenec <lukas.naleze...@firma.seznam.cz>
Subject Re: Tablesplit.getLength returns 0
Date Mon, 03 Feb 2014 16:37:08 GMT
I have done some changes - see 
https://issues.apache.org/jira/browse/HBASE-10413 for more discussion.

I need help with unit test. Is there some simple unit test 
helper/utility i can use ?  I need to create table with some regions and 
then work with their sizes.  It should be local, there should be some 
level of abstraction.

The code works well but there are outlayers - one map with 1.6G region 
and 250MB "Map output bytes" takes 1 hour (it should take few minutes). 
Do you have got some idea why this happens ?

2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 200
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data buffer = 159383552/199229440
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record buffer = 524288/655360

2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record
full = true
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 118888312;
bufvoid = 199229440
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 524288;
length = 655360
2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
[.snappy]
2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record
full = true
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart = 118888312; bufend
= 27185690; bufvoid = 199229433
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart = 524288; kvend = 393215;
length = 655360
2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1
2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2
2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted segments
2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
[.snappy]


Lukas


On 31.1.2014 16:35, Ted Yu wrote:
> +  public void setLength(long length) {
>
> This method in TableSplit can be package private.
>
> +  final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);
>
>
> Name of class is wrong.
>
>
> +    makeFamilyFilter(families);
>
>
> The return value is ignored.
>
>
>
> Can you make a patch for trunk and attach to JIRA ?
>
>
> Thanks
>
>
> On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
> lukas.nalezenec@firma.seznam.cz> wrote:
>
>> Hi,
>> I have written first draft: https://github.com/
>> lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
>> Can you please review it and let mi know it is feasible solution ?
>> Lukas
>>
>>
>> On 30.1.2014 18:14, Nick Dimiduk wrote:
>>
>>> Sounds good, I'll watch for your patch!
>>>
>>> On Thursday, January 30, 2014, Lukas Nalezenec <
>>>
>>> lukas.nalezenec@firma.seznam.cz> wrote:
>>>
>>>   I talked with guy who worked on this and he said our production issue was
>>>> probably not directly caused by getLength() returning 0.
>>>> Anyway, we are interested in fixing that, estimating length from files is
>>>> good idea.
>>>>
>>>> Lukas
>>>>
>>>>    InputSplit.getLength() and RecordReader.getProgress() is important for
>>>> the
>>>>
>>>>> MR framework to be able to show progress etc. It would be good to return
>>>>> raw data sizes in getLength() computed from region's total size of store
>>>>> files, and progress being calculated from scanner's amount of raw data
>>>>> seen.
>>>>>
>>>>>   Enis
>>>>
>>>>
>>>>


Mime
View raw message