hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Is it "legal" to write to the same HBase table you're scanning?
Date Thu, 30 Sep 2010 19:45:26 GMT
When you "write" to the context you are really just creating values
that get sent to the reduce phase where the actual puts to HBase

Could you describe more why you think TIF is failing to return all
rows?  Setup code for your map reduce, existing row counts, etc would
all be helpful.  In otherwords, no we have not heard of any issues of
TIF failing to read all rows of a table - although it of course isn't


On Thu, Sep 30, 2010 at 12:29 PM, Curt Allred <curt@mediosystems.com> wrote:
> Thanks for the reply.  I wasnt clear on one point... the scan and put are both in the
map phase. i.e...
> TestMapper extends TableMapper<ImmutableBytesWritable, Put> {
>  map(rowId, Result, Context) {
>    // read & process row data
>    ...
>    // now write new data to the same row of the same table
>    put = new Put(rowId);
>    put.add(newStuff)
>    context.write(rowId, put); // using TableOutputFormat
>  }
> }
> I dont expect to see the new data I just wrote, but it needs to be there for later map-reduce
> When I run this the scanner iterator fails to return all rows, without an exception or
any indication of failure.
> I could write the new data to an interim location and write it to the table during the
Reduce phase but this seems inefficient since I'm not actually doing any reduction.
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Thursday, September 30, 2010 11:27 AM
> To: user@hbase.apache.org
> Subject: Re: Is it "legal" to write to the same HBase table you're scanning?
> Something else is going on, since TableInputFormat and
> TableOutputFormat in the same map reduce are not concurrent... the
> maps run, then the reduces run, and there is no overlap.  A feature of
> mapreduce.
> So if you were expecting to see the rows you were 'just writing'
> during your map phase, you wont alas.
> -ryan
> On Thu, Sep 30, 2010 at 11:11 AM, Curt Allred <curt@mediosystems.com> wrote:
>> I cant find any documentation which says you shouldn't write to the same HBase table
you're scanning.  But it doesnt seem to work...  I have a mapper (subclass of TableMapper)
which scans a table, and for each row encountered during the scan, it updates a column of
the row, writing it back to the same table immediately (using TableOutputFormat).  Occasionally
the scanner ends without completing the scan.  No exception is thrown, no indication of failure,
it just says its done when it hasnt actually returned all the rows.  This happens even if
the scan has no timestamp specification or filters.  It seems to only happen when I use a
cache size greater than 1 (hbase.client.scanner.caching).  This behavior is also repeatable
using an HTable outside of a map-reduce job.
>> The following blog entry implies that it might be risky, or worse, socially unacceptable
>> http://www.larsgeorge.com/2009/01/how-to-use-hbase-with-hadoop.html:
>>  > Again, I have cases where I do not output but save back to
>>  > HBase. I could easily write the records back into HBase in
>>  > the reduce step, so why pass them on first? I think this is
>>  > in some cases just common sense or being a "good citizen".
>>  > ...
>>  > Writing back to the same HBase table is OK when doing it in
>>  > the Reduce phase as all scanning has concluded in the Map
>>  > phase beforehand, so all rows and columns are saved to an
>>  > intermediate Hadoop SequenceFile internally and when you
>>  > process these and write back to HBase you have no problems
>>  > that there is still a scanner for the same job open reading
>>  > the table.
>>  >
>>  > Otherwise it is OK to write to a different HBase table even
>>  > during the Map phase.
>> But I also found a jira issue which indicates it "should" work, but there was a bug
awhile back which was fixed:
>>  > https://issues.apache.org/jira/browse/HBASE-810: Prevent temporary
>>  > deadlocks when, during a scan with write operations, the region splits
>> Anyone else writing while scanning? Or know of documentation that addresses this
>> Thanks,
>> -Curt

View raw message