hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: Bulk load moving HFiles to the wrong region
Date Mon, 16 Dec 2013 16:19:24 GMT
I've managed to isolate the problem.
I implemented an extension of HFileOutputFormat - because each bulk load
will import data to the newly created regions only, I pass the prefix
(yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so
that getRegionStartKeys returns only the corresponding keys.
I did this in order to avoid having 2000 reducers when my target is 15
regions...

When I use HFileOutputFormat  it seems to work. But I don't understand why
it doesn't happen in other tables (some smaller and some much much bigger)
or even in that table it happens every once in a while ?

Any ideas ?



On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <amits@infolinks.com> wrote:

> Loaded regions are listed in .META. table and the ENCODED field in the
> table points to an existing directory. But all family directories in this
> region are empty...
>
>
> On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <amits@infolinks.com> wrote:
>
>> I ran the hbck tool, and while I do have some inconsistencies they are
>> not in the table that has the bulk load issues.
>>
>>
>>
>> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <amits@infolinks.com> wrote:
>>
>>> RegionServer logs in the RegionServer that the files are moved to indeed
>>> shows that all files are moved to that region (when it doesn't happen it
>>> shows only 1 file per family moved to a RegionServer)
>>>
>>>
>>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <amits@infolinks.com> wrote:
>>>
>>>> In the first step, the files are read correctly and regionGroups is
>>>> creates as it should.
>>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
>>>> that ServerCallable's regionName returned from server is the wrong region
>>>> (the pre-split last region).
>>>> The previous last region is not supposed to delete I'm just adding new
>>>> regions (always following lexicographically) so that the last region before
>>>> the pre-split is not the last anymore.
>>>> It seems that wherever the ServerCallable is running, it is not updated
>>>> with the new regions... I tried major compacting (the new regions) after
>>>> pre-split and before the bulkload, but that didn't help.
>>>>
>>>>
>>>>
>>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijieshan@huawei.com>wrote:
>>>>
>>>>> As we know, bulk load has two steps:
>>>>> 1. Create HFiles by MapReduce.
>>>>> 2. Load HFiles into HBase.
>>>>>
>>>>> I wonder whether it read the right partitions information during the
>>>>> first step. Have you run hbck tool to check the cluster healthy?
>>>>> You mentioned you see the new regions in the webapp. The files were
>>>>> moved to the previous old region indicated the old region directory was
>>>>> still there. So you started bulk load just after region split? (Old region
>>>>> directory will be deleted soon by CatalogJanitor after region-split once
>>>>> compaction finished)
>>>>>
>>>>> I suggest to check the regionserver logs.
>>>>>
>>>>> Jieshan.
>>>>> -----Original Message-----
>>>>> From: Amit Sela [mailto:amits@infolinks.com]
>>>>> Sent: Monday, December 16, 2013 2:29 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>>>
>>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the
>>>>> entire load is (almost) evenly spread.
>>>>> The problem I described causes the bulk load to load all files to to
>>>>> the last region of the previous day.
>>>>> Thanks.
>>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bijieshan@huawei.com> wrote:
>>>>>
>>>>> > Hi Amit:
>>>>> > Can you provide the split-keys of the new regions and your row-key
>>>>> design?
>>>>> >
>>>>> > Thank you.
>>>>> > Jieshan.
>>>>> > -----Original Message-----
>>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>>>>> > Sent: Monday, December 16, 2013 7:09 AM
>>>>> > To: user@hbase.apache.org
>>>>> > Subject: Bulk load moving HFiles to the wrong region
>>>>> >
>>>>> > Hi all,
>>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>>>> > When trying to bulk load using the Java API I sometimes get the
>>>>> HFiles
>>>>> > moved to the wrong directory.
>>>>> > I'm pre-splitting regions and the new regions are always the last
>>>>> > (lexicographically), so when this happens all files move to the
last
>>>>> > region pre-split. But the split does work. I see the new regions
in
>>>>> > the webapp before bulk load executes. Once a table has this problem
>>>>> > (not all the time) it keeps on until I restart HBase.
>>>>> >
>>>>> > Anyone seen something similar ?
>>>>> >
>>>>> > Thanks,
>>>>> > Amit.
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message