hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: LoadIncrementalHFiles now deleting the hfiles?
Date Sun, 01 May 2011 05:50:40 GMT
Hi Adam,

It's always been this way.

The only time you'll see them copied is if you run the load from a remote
filesystem - ie if you specify a URL that doesn't match the URL used in
hbase.rootdir.

See th bulkLoadHFile() method in Store.java:
    // Move the file if it's on another filesystem
    FileSystem srcFs = srcPath.getFileSystem(conf);
    if (!srcFs.equals(fs)) {
      LOG.info("File " + srcPath + " on different filesystem than " +
          "destination store - moving to this filesystem.");
      Path tmpPath = getTmpPath();
      FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
      LOG.info("Copied to temporary path on dst filesystem: " + tmpPath);
      srcPath = tmpPath;
    }

Perhaps your config changed slightly during the upgrade?

-Todd

On Fri, Apr 29, 2011 at 1:11 PM, Adam Phelps <amp@opendns.com> wrote:

> I could believe that, although I was under the impression that these files
> are actually incorporated into the existing region files.  Still, its
> definitely a different behavior than what we were seeing before our recent
> upgrade.
>
> - Adam
>
>
> On 4/29/11 10:41 AM, Patrick Angeles wrote:
>
>> Adam,
>>
>> They are probably not deleted, but moved to the appropriate region
>> subdirectory under /hbase.
>>
>> On Fri, Apr 29, 2011 at 1:15 PM, Adam Phelps<amp@opendns.com>  wrote:
>>
>>  I just verified this, and the hfiles seem to be deleted one at a time as
>>> the bulk load runs.
>>>
>>> - Adam
>>>
>>>
>>> On 4/28/11 4:28 PM, Stack wrote:
>>>
>>>  I took a look through the code and don't see any explicit removes and
>>>> looking through history of changes to the file, I don't see any change
>>>> of substance.
>>>>
>>>> Can you figure what is doing the delete? At what stage?  Is it as
>>>> completebulkload runs?
>>>>
>>>> St.Ack
>>>>
>>>> On Thu, Apr 28, 2011 at 10:59 AM, Adam Phelps<amp@opendns.com>   wrote:
>>>>
>>>>  We were using a backup scheme for our system where we have map-reduce
>>>>> jobs
>>>>> generating HFiles, which we then loaded using LoadIncrementalHFiles
>>>>> before
>>>>> making a remote copy of them using distcp.
>>>>>
>>>>> However we just upgraded hbase (we're using cloudera's package, so we
>>>>> went
>>>>> from CDH3B4 to CDH3U0, both of which are versions of 0.90.1), and
>>>>> discovered
>>>>> that the HFiles now get deleted by the load operation.  Is this a
>>>>> recent
>>>>> change?  Is there a configuration variable to revert this behavior?
>>>>>
>>>>> We can work around it by doing the copy before the load, but that is
>>>>> less
>>>>> than optimal in our scenario as we'd prefer to have quicker access to
>>>>> the
>>>>> data in HBase.
>>>>>
>>>>> - Adam
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message