hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Questions about HBase
Date Wed, 05 Jun 2013 04:29:05 GMT
bq. But i am not very sure if we can control the files getting selected for
compaction in the older verisons.

Same mechanism is available in 0.94

Take a look
at src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
where you would find the following methods (and more):

  public void preCompactSelection(final
ObserverContext<RegionCoprocessorEnvironment> c,
      final Store store, final List<StoreFile> candidates, final
CompactionRequest request)
  public InternalScanner
preCompact(ObserverContext<RegionCoprocessorEnvironment> e,
      final Store store, final InternalScanner scanner) throws IOException {

Cheers

On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> >>Does Minor compaction remove HFiles in which all entries are out of
>    TTL or does only Major compaction do that
> Yes it applies for Minor compactions.
> >>Is there a way of configuring major compaction to compact only files
>    older than a certain time or to compress all the files except the latest
>    few?
> In the latest trunk version the compaction algo itself can be plugged.
>  There are some coprocessor hooks that gives control on the scanner that
> gets created for compaction with which we can control the KVs being
> selected. But i am not very sure if we can control the files getting
> selected for compaction in the older verisons.
> >> The above excerpt seems to imply to me that the search for key inside a
> block
> is linear and I feel I must be reading it wrong. I would expect the scan to
> be a binary search.
> Once the data block is identified for a key, we seek to the beginning of
> the block and then do a linear search until we reach the exact key that we
> are looking out for.  Because internally the data (KVs) are stored as byte
> buffers per block and it follows this pattern
> <keylength><valuelength><keybytearray><valuebytearray>
> >>Is there a way to warm up the bloom filter and block index cache for
>    a table?
> You always want the bloom and block index to be in cache?
>
>
> On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <pankaj@brightroll.com>
> wrote:
>
> > Hi,
> >
> > I have a few small questions regarding HBase. I've searched the forum but
> > couldn't find clear answers hence asking them here:
> >
> >
> >    1. Does Minor compaction remove HFiles in which all entries are out of
> >    TTL or does only Major compaction do that? I found this jira:
> >    https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if
> > the
> >    compaction being talked about there is minor or major.
> >    2. Is there a way of configuring major compaction to compact only
> files
> >    older than a certain time or to compress all the files except the
> latest
> >    few? We basically want to use the time based filtering optimization in
> >    HBase to get the latest additions to the table and since major
> > compaction
> >    bunches everything into one file, it would defeat the optimization.
> >    3. Is there a way to warm up the bloom filter and block index cache
> for
> >    a table? This is for a case where I always want the bloom filters and
> > index
> >    to be all in memory, but not the data blocks themselves.
> >    4. This one is related to what I read in the HBase definitive guide
> >    bloom filter section
> >    Given a random row key you are looking for, it is very likely that
> this
> >    key will fall in between two block start keys. The only way for HBase
> to
> >    figure out if the key actually exists is by loading the block and
> > scanning
> >    it to find the key.
> >    The above excerpt seems to imply to me that the search for key inside
> a
> >    block is linear and I feel I must be reading it wrong. I would expect
> > the
> >    scan to be a binary search.
> >
> >
> > Thanks in Advance,
> > Pankaj
> >
> > --
> >
> >
> > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
> pankaj@brightroll.com
> >
> > Pankaj Gupta | Software Engineer
> >
> > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
> >
> >
> > United States | Canada | United Kingdom | Germany
> >
> >
> > We're hiring<
> >
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> > >
> > !
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message