hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Should HTable.put() return a Future?
Date Tue, 06 Apr 2010 16:43:04 GMT
That is my issue, you sort of fire and forget about the updates. Even
flushing the writes will not help as far as I see it. If you have a
server fail in the process of persisting its memstored data the error
is not sent back to the caller. Only a deep log file analysis may
reveal the issue, but even telling what is missing will be difficult
as all you see is an IOE?

On Tue, Apr 6, 2010 at 6:36 PM, Todd Lipcon <todd@cloudera.com> wrote:
> On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>> The issue isn't with the write buffer here, it's the WAL. Your edits
>> are in the MemStore so as far as your clients can tell, the data is
>> all persisted. In this case you would need to know when all the
>> memstores that contain your data are flushed... Best practice when
>> turning off WAL is force flushing the tables after the job is done,
>> else you can't guarantee durability for the last edits.
> You still can't guarantee durability for any of the edits, since a failure
> in the middle of your job is undetectable :)
> -Todd
>> J-D
>> On Tue, Apr 6, 2010 at 4:02 AM, Lars George <lars.george@gmail.com> wrote:
>> > Hi,
>> >
>> > I have an issue where I do bulk import and since WAL is off and a
>> > default write buffer used (TableOutputFormat) I am running into
>> > situations where the MR job completes successfully but not all data is
>> > actually restored. The issue seems to be a failure on the RS side as
>> > it cannot flush the write buffers because the MR overloads the cluster
>> > (usually the .META: hosting RS is the breaking point) or causes the
>> > underlying DFS to go slow and that repercussions all the way up to the
>> > RS's.
>> >
>> > My question is, would it make sense as with any other asynchronous IO
>> > to return a Future from the put() that will help checking the status
>> > of the actual server side async flush operation? Or am I misguided
>> > here? Please advise.
>> >
>> > Lars
>> >
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message