hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Nalezenec <lukas.naleze...@firma.seznam.cz>
Subject Re: Tablesplit.getLength returns 0
Date Mon, 03 Feb 2014 17:41:01 GMT
Hi, thanks.
I planned to do the patch to Jira. I opened the pull request for code 
review.

The configuration option is in RegionSizeCalculator.java line 63 .

https://github.com/lukasnalezenec/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java#L63

Lukas


On 3.2.2014 18:20, Ted Yu wrote:
> You can take a look at the following method in HBaseTestingUtility:
>
>    public HTable createTable(byte[] tableName, byte[][] families,
>
>        int numVersions, byte[] startKey, byte[] endKey, int numRegions)
> throws IOException {
>
> I saw you issue a git pull request - please generate patch based on trunk
> and attach to JIRA. HBase source repo is currently in subversion.
> In https://github.com/apache/hbase/pull/8/files , I don't seem to find the
> new config parameter which turns this feature on/off.
>
> Regards
>
> On Mon, Feb 3, 2014 at 8:37 AM, Lukas Nalezenec <
> lukas.nalezenec@firma.seznam.cz> wrote:
>
>> I have done some changes - see https://issues.apache.org/
>> jira/browse/HBASE-10413 for more discussion.
>>
>> I need help with unit test. Is there some simple unit test helper/utility
>> i can use ?  I need to create table with some regions and then work with
>> their sizes.  It should be local, there should be some level of abstraction.
>>
>> The code works well but there are outlayers - one map with 1.6G region and
>> 250MB "Map output bytes" takes 1 hour (it should take few minutes). Do you
>> have got some idea why this happens ?
>>
>> 2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb
>> = 200
>> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data
>> buffer = 159383552/199229440
>> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record
>> buffer = 524288/655360
>>
>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling
>> map output: record full = true
>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart =
>> 0; bufend = 118888312; bufvoid = 199229440
>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart =
>> 0; kvend = 524288; length = 655360
>> 2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new compressor [.snappy]
>> 2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished
>> spill 0
>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling
>> map output: record full = true
>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart =
>> 118888312; bufend = 27185690; bufvoid = 199229433
>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart =
>> 524288; kvend = 393215; length = 655360
>> 2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished
>> spill 1
>> 2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting
>> flush of map output
>> 2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished
>> spill 2
>> 2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3
>> sorted segments
>> 2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor [.snappy]
>>
>>
>> Lukas
>>
>>
>>
>> On 31.1.2014 16:35, Ted Yu wrote:
>>
>>> +  public void setLength(long length) {
>>>
>>> This method in TableSplit can be package private.
>>>
>>> +  final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);
>>>
>>>
>>> Name of class is wrong.
>>>
>>>
>>> +    makeFamilyFilter(families);
>>>
>>>
>>> The return value is ignored.
>>>
>>>
>>>
>>> Can you make a patch for trunk and attach to JIRA ?
>>>
>>>
>>> Thanks
>>>
>>>
>>> On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
>>> lukas.nalezenec@firma.seznam.cz> wrote:
>>>
>>>   Hi,
>>>> I have written first draft: https://github.com/
>>>> lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
>>>> Can you please review it and let mi know it is feasible solution ?
>>>> Lukas
>>>>
>>>>
>>>> On 30.1.2014 18:14, Nick Dimiduk wrote:
>>>>
>>>>   Sounds good, I'll watch for your patch!
>>>>> On Thursday, January 30, 2014, Lukas Nalezenec <
>>>>>
>>>>> lukas.nalezenec@firma.seznam.cz> wrote:
>>>>>
>>>>>    I talked with guy who worked on this and he said our production issue
>>>>> was
>>>>>
>>>>>> probably not directly caused by getLength() returning 0.
>>>>>> Anyway, we are interested in fixing that, estimating length from
files
>>>>>> is
>>>>>> good idea.
>>>>>>
>>>>>> Lukas
>>>>>>
>>>>>>     InputSplit.getLength() and RecordReader.getProgress() is important
>>>>>> for
>>>>>> the
>>>>>>
>>>>>>   MR framework to be able to show progress etc. It would be good
to
>>>>>>> return
>>>>>>> raw data sizes in getLength() computed from region's total size
of
>>>>>>> store
>>>>>>> files, and progress being calculated from scanner's amount of
raw data
>>>>>>> seen.
>>>>>>>
>>>>>>>    Enis
>>>>>>>
>>>>>>
>>>>>>
>>>>>>


Mime
View raw message