hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: PutSortReducer memory threshold
Date Wed, 06 Nov 2013 12:55:16 GMT
So, If I have a lot of puts per row, say 100 times the number of memory
threshold, 100 different store files will be written to the same region (at
least...) ?
Will this trigger major compaction for the region during/after bulk load ?
Is the trigger #storeFiles >  hbase.hstore.compactionThreshold ?


On Wed, Nov 6, 2013 at 1:01 PM, rajeshbabu chintaguntla <
rajeshbabu.chintaguntla@huawei.com> wrote:

>
> When we execute context.write(null,null),we will close the current
> writer(which opened a storefile) and on next write request we will create
> new writer for other storefile.
> If a row key has puts of size more than the threshold, then they will be
> written to multiple store files. So same rowkey data will be distributed to
> multiple storefiles.
> In outer while loop we will continue the reduce from the point at which we
> have flushed or rolled. We will not omit any data.
>
> ________________________________________
> From: Amit Sela [amits@infolinks.com]
> Sent: Wednesday, November 06, 2013 3:54 PM
> To: user@hbase.apache.org
> Subject: PutSortReducer memory threshold
>
> Looking at the code of PutSortReducer I see that if my key has puts with
> size bigger than memory, the iteration stops and all puts up to the
> threshold point will be written to context.
> If iterator has more puts,  context.write(null,null) is executed.
> Does this tell the bulk load tool to re-execute the reduce from that point
> in some way (if so, how ?) or the rest of the data is just omitted ?
>
> Thanks,
>
> Amit.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message