hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: Bulk load moving HFiles to the wrong region
Date Mon, 16 Dec 2013 14:37:07 GMT
Loaded regions are listed in .META. table and the ENCODED field in the
table points to an existing directory. But all family directories in this
region are empty...


On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <amits@infolinks.com> wrote:

> I ran the hbck tool, and while I do have some inconsistencies they are not
> in the table that has the bulk load issues.
>
>
>
> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <amits@infolinks.com> wrote:
>
>> RegionServer logs in the RegionServer that the files are moved to indeed
>> shows that all files are moved to that region (when it doesn't happen it
>> shows only 1 file per family moved to a RegionServer)
>>
>>
>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <amits@infolinks.com> wrote:
>>
>>> In the first step, the files are read correctly and regionGroups is
>>> creates as it should.
>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
>>> that ServerCallable's regionName returned from server is the wrong region
>>> (the pre-split last region).
>>> The previous last region is not supposed to delete I'm just adding new
>>> regions (always following lexicographically) so that the last region before
>>> the pre-split is not the last anymore.
>>> It seems that wherever the ServerCallable is running, it is not updated
>>> with the new regions... I tried major compacting (the new regions) after
>>> pre-split and before the bulkload, but that didn't help.
>>>
>>>
>>>
>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijieshan@huawei.com> wrote:
>>>
>>>> As we know, bulk load has two steps:
>>>> 1. Create HFiles by MapReduce.
>>>> 2. Load HFiles into HBase.
>>>>
>>>> I wonder whether it read the right partitions information during the
>>>> first step. Have you run hbck tool to check the cluster healthy?
>>>> You mentioned you see the new regions in the webapp. The files were
>>>> moved to the previous old region indicated the old region directory was
>>>> still there. So you started bulk load just after region split? (Old region
>>>> directory will be deleted soon by CatalogJanitor after region-split once
>>>> compaction finished)
>>>>
>>>> I suggest to check the regionserver logs.
>>>>
>>>> Jieshan.
>>>> -----Original Message-----
>>>> From: Amit Sela [mailto:amits@infolinks.com]
>>>> Sent: Monday, December 16, 2013 2:29 PM
>>>> To: user@hbase.apache.org
>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>>
>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the
>>>> entire load is (almost) evenly spread.
>>>> The problem I described causes the bulk load to load all files to to
>>>> the last region of the previous day.
>>>> Thanks.
>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bijieshan@huawei.com> wrote:
>>>>
>>>> > Hi Amit:
>>>> > Can you provide the split-keys of the new regions and your row-key
>>>> design?
>>>> >
>>>> > Thank you.
>>>> > Jieshan.
>>>> > -----Original Message-----
>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>>>> > Sent: Monday, December 16, 2013 7:09 AM
>>>> > To: user@hbase.apache.org
>>>> > Subject: Bulk load moving HFiles to the wrong region
>>>> >
>>>> > Hi all,
>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>>> > When trying to bulk load using the Java API I sometimes get the HFiles
>>>> > moved to the wrong directory.
>>>> > I'm pre-splitting regions and the new regions are always the last
>>>> > (lexicographically), so when this happens all files move to the last
>>>> > region pre-split. But the split does work. I see the new regions in
>>>> > the webapp before bulk load executes. Once a table has this problem
>>>> > (not all the time) it keeps on until I restart HBase.
>>>> >
>>>> > Anyone seen something similar ?
>>>> >
>>>> > Thanks,
>>>> > Amit.
>>>> >
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message