hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rajeshbabu chintaguntla <rajeshbabu.chintagun...@huawei.com>
Subject RE: PutSortReducer memory threshold
Date Wed, 06 Nov 2013 11:01:23 GMT

When we execute context.write(null,null),we will close the current writer(which opened a storefile)
and on next write request we will create new writer for other storefile.
If a row key has puts of size more than the threshold, then they will be written to multiple
store files. So same rowkey data will be distributed to multiple storefiles. 
In outer while loop we will continue the reduce from the point at which we have flushed or
rolled. We will not omit any data.

From: Amit Sela [amits@infolinks.com]
Sent: Wednesday, November 06, 2013 3:54 PM
To: user@hbase.apache.org
Subject: PutSortReducer memory threshold

Looking at the code of PutSortReducer I see that if my key has puts with
size bigger than memory, the iteration stops and all puts up to the
threshold point will be written to context.
If iterator has more puts,  context.write(null,null) is executed.
Does this tell the bulk load tool to re-execute the reduce from that point
in some way (if so, how ?) or the rest of the data is just omitted ?



View raw message