hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: Filtering/Collection columns during Major Compaction
Date Tue, 11 Dec 2012 05:09:25 GMT
So, I actually wrote something that uses the preCompactScannerOpen and
initialize a StoreScanner in exactly the same way as we do for a major
compaction. Except that I add the filter I need to this scanner
(ColumnPaginationFilter) - I guess that should accomplish the same thing.

On Mon, Dec 10, 2012 at 9:06 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> You can replace (or post filter) the scanner used for the compaction using
> coprocessors.
> Take a look at RegionObserver.preCompact, which is passed a scanner that
> will iterate over all KVs that should make it into the new store file.
> You can now wrap this scanner and then any filtering you'd like to do.
>
>
>
> ________________________________
>  From: Varun Sharma <varun@pinterest.com>
> To: user@hbase.apache.org
> Sent: Monday, December 10, 2012 5:58 AM
> Subject: Filtering/Collection columns during Major Compaction
>
> Hi,
>
> My understanding of major compaction is that it rewrites one store file and
> does a merge of the memstore, store files on disk and cleans out delete
> tombstones and puts prior to them and cleans out excess versions. We want
> to limit the number of columns per row in hbase. Also, we want to limit
> them in lexicographically sorted order - which means we take the top, say
> 100 smallest columns (in lexicographical sense) and only keep them while
> discard the rest.
>
> One way to do this would be to clean out columns in a daily mapreduce job.
> Or another way is to clean them out during the major compaction which can
> be run daily too. I see, from the code that a major compaction essentially
> invokes a Scan over the region - so if the Scan is invoked with the
> appropriate filter (say ColumnCountGetFilter) - would that do the trick ?
>
> Thanks
> Varun
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message