hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Questions about HBase
Date Thu, 06 Jun 2013 05:38:46 GMT
> I feel that warming up the block and
index cache could be a useful feature for many workflows. Would it be a
good idea to have a JIRA for that?

This will be against the concept of multi level index block structure in
HFile V2 where we dont want the whole index data to be loaded at once
initially and kept in memory. When then HFile is so big mainly..   So the
whole point is it is upto the usage. Ya u can experiment with that and see
the perf diff ..  As long as it can be made as an optional warm up it can
help some usecases.

One more confirmation :  This index block misses were there at the initial
time only and later there is no such indication from the logs?

Good discussion ....

-Anoop-

On Thu, Jun 6, 2013 at 8:22 AM, Pankaj Gupta <pankaj@brightroll.com> wrote:

> I'm not sure what caused so many index block misses. At the time I ran the
> experiment had over 12 GB of RAM assigned to block cache. My understanding
> is that since I had restarted HBase before running this experiment it was
> basically loading index blocks as and when needed and thus index misses
> were spread over a period of time. I monitored the region server while
> running this debugging session and didn't see a single block eviction so it
> couldn't be that the index blocks were being kicked out by something else.
>
> I've got some really good information in this thread and I thank you all.
> The blockSeek function in HFileReaderV2 clearly confirms the linear nature
> of scan for finding a key in a block. I feel that warming up the block and
> index cache could be a useful feature for many workflows. Would it be a
> good idea to have a JIRA for that?
>
> Thanks,
> Pankaj
>
>
> On Wed, Jun 5, 2013 at 1:24 AM, Anoop John <anoop.hbase@gmail.com> wrote:
>
> > Why there are so many miss for the index blocks? WHat is the block cache
> > mem you use?
> >
> > On Wed, Jun 5, 2013 at 12:37 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > I get your point Pankaj.
> > > Going thro the code to confirm it
> > >     // Data index. We also read statistics about the block index
> written
> > > after
> > >     // the root level.
> > >     dataBlockIndexReader.readMultiLevelIndexRoot(
> > >         blockIter.nextBlockWithBlockType(BlockType.ROOT_INDEX),
> > >         trailer.getDataIndexCount());
> > >
> > >     // Meta index.
> > >     metaBlockIndexReader.readRootIndex(
> > >         blockIter.nextBlockWithBlockType(BlockType.ROOT_INDEX),
> > >         trailer.getMetaIndexCount());
> > >
> > > We read the root level of the multilevel index and the actual root
> index.
> > > So as and when when we need new index blocks we will be hitting the
> disk
> > > and your observation is correct.  Sorry if i had confused you in this.
> > > The new version of HFile was mainly to address the concern in the
> > previous
> > > versoin where the entire indices was in memory.  The version V2
> addressed
> > > that concern like having the root level (something like metadata of the
> > > indices) and from there you should be able to get new index blocks.
> > > But there are chances that if you region size is small you may have
> only
> > > one level and the entire thing may be in memory.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Wed, Jun 5, 2013 at 11:56 AM, Pankaj Gupta <pankaj@brightroll.com>
> > > wrote:
> > >
> > > > Sorry, forgot to mention that I added the log statements to the
> method
> > > > readBlock in HFileReaderV2.java. I'm on hbase 0.94.2.
> > > >
> > > >
> > > > On Tue, Jun 4, 2013 at 11:16 PM, Pankaj Gupta <pankaj@brightroll.com
> >
> > > > wrote:
> > > >
> > > > > Some context on how I observed bloom filters being loaded
> > constantly. I
> > > > > added the following logging statements to HFileReaderV2.java:
> > > > > }
> > > > >         if (!useLock) {
> > > > >           // check cache again with lock
> > > > >           useLock = true;
> > > > >           continue;
> > > > >         }
> > > > >
> > > > >         // Load block from filesystem.
> > > > >         long startTimeNs = System.nanoTime();
> > > > >         HFileBlock hfileBlock =
> > > > > fsBlockReader.readBlockData(dataBlockOffset,
> > > > >             onDiskBlockSize, -1, pread);
> > > > >         hfileBlock = dataBlockEncoder.diskToCacheFormat(hfileBlock,
> > > > >             isCompaction);
> > > > >         validateBlockType(hfileBlock, expectedBlockType);
> > > > >         passSchemaMetricsTo(hfileBlock);
> > > > >         BlockCategory blockCategory =
> > > > > hfileBlock.getBlockType().getCategory();
> > > > >
> > > > > // My logging statements ---->
> > > > >         if(blockCategory == BlockCategory.INDEX) {
> > > > >           LOG.info("index block miss, reading from disk " +
> > cacheKey);
> > > > >         } else if (blockCategory == BlockCategory.BLOOM) {
> > > > >           LOG.info("bloom block miss, reading from disk " +
> > cacheKey);
> > > > >         } else {
> > > > >           LOG.info("block miss other than index or bloom, reading
> > from
> > > > > disk " + cacheKey);
> > > > >         }
> > > > > //-------------->
> > > > >         final long delta = System.nanoTime() - startTimeNs;
> > > > >         HFile.offerReadLatency(delta, pread);
> > > > >         getSchemaMetrics().updateOnCacheMiss(blockCategory,
> > > isCompaction,
> > > > > delta);
> > > > >
> > > > >         // Cache the block if necessary
> > > > >         if (cacheBlock && cacheConf.shouldCacheBlockOnRead(
> > > > >             hfileBlock.getBlockType().getCategory())) {
> > > > >           cacheConf.getBlockCache().cacheBlock(cacheKey,
> hfileBlock,
> > > > >               cacheConf.isInMemory());
> > > > >         }
> > > > >
> > > > >         if (hfileBlock.getBlockType() == BlockType.DATA) {
> > > > >           HFile.dataBlockReadCnt.incrementAndGet();
> > > > >         }
> > > > >
> > > > > With these in place I saw the following statements in log:
> > > > > 2013-06-05 01:04:55,281 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_30361506
> > > > > 2013-06-05 01:05:00,579 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_28779560
> > > > > 2013-06-05 01:07:41,335 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_4199735
> > > > > 2013-06-05 01:08:58,460 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_8519720
> > > > > 2013-06-05 01:11:01,545 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_12838948
> > > > > 2013-06-05 01:11:03,035 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_3973250
> > > > > 2013-06-05 01:11:36,339 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_17159812
> > > > > 2013-06-05 01:12:35,398 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_21478349
> > > > > 2013-06-05 01:13:02,572 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_25798003
> > > > > 2013-06-05 01:13:03,260 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_8068381
> > > > > 2013-06-05 01:13:20,265 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_30118048
> > > > > 2013-06-05 01:13:20,522 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_60833137
> > > > > 2013-06-05 01:13:32,261 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_34545951
> > > > > 2013-06-05 01:13:48,504 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_38865311
> > > > > 2013-06-05 01:13:49,951 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_12161793
> > > > > 2013-06-05 01:14:02,073 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_43185677
> > > > > 2013-06-05 01:14:12,956 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_47506066
> > > > > 2013-06-05 01:14:25,132 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_51825831
> > > > > 2013-06-05 01:14:25,946 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_16257519
> > > > > 2013-06-05 01:14:34,478 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_56145793
> > > > > 2013-06-05 01:14:45,319 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_60466405
> > > > > 2013-06-05 01:14:45,998 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_91304775
> > > > > 2013-06-05 01:14:58,203 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_64893493
> > > > > 2013-06-05 01:14:58,463 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_20352561
> > > > > 2013-06-05 01:15:09,299 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_69214092
> > > > > 2013-06-05 01:15:32,944 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_73533616
> > > > > 2013-06-05 01:15:46,903 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_77865906
> > > > > 2013-06-05 01:15:47,273 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_24448138
> > > > > 2013-06-05 01:15:55,312 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_82185687
> > > > > 2013-06-05 01:16:07,591 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_86506129
> > > > > 2013-06-05 01:16:20,728 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_90825624
> > > > > 2013-06-05 01:16:22,551 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_28542144
> > > > > 2013-06-05 01:16:22,810 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_121777484
> > > > > 2013-06-05 01:16:23,035 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_57670002
> > > > > 2013-06-05 01:16:33,196 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_95253904
> > > > > 2013-06-05 01:16:48,187 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_99574899
> > > > > 2013-06-05 01:17:06,648 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_103895087
> > > > > 2013-06-05 01:17:10,526 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_32744846
> > > > > 2013-06-05 01:17:22,939 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_108214936
> > > > > 2013-06-05 01:17:36,010 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_112535209
> > > > > 2013-06-05 01:17:46,028 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_116855742
> > > > > 2013-06-05 01:17:47,029 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_36838416
> > > > > 2013-06-05 01:17:54,472 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_121174753
> > > > > 2013-06-05 01:17:55,491 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_152248177
> > > > > 2013-06-05 01:18:05,912 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_125601238
> > > > > 2013-06-05 01:18:15,417 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_129921797
> > > > > 2013-06-05 01:18:16,713 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_40933856
> > > > > 2013-06-05 01:18:29,521 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_134242324
> > > > > 2013-06-05 01:18:38,653 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_138561860
> > > > > 2013-06-05 01:18:49,280 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_142881436
> > > > > 2013-06-05 01:18:50,052 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_45029905
> > > > > 2013-06-05 01:18:58,339 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_147201737
> > > > > 2013-06-05 01:19:06,371 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_151533253
> > > > > 2013-06-05 01:19:07,782 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_182719269
> > > > >
> > > > > I kept seeing these statements appearing constantly over a long
> > period,
> > > > > this seemed to confirm to me that bloom filter blocks are being
> > loaded
> > > > over
> > > > > a period time, which also matched what I read about HFileV2. May
> be I
> > > am
> > > > > wrong about both. Would love to understand what's really going on.
> > > > >
> > > > > Thanks in Advance,
> > > > > Pankaj
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jun 4, 2013 at 11:05 PM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >
> > > > >> Whenever the region is opened all the bloom filter meta data
are
> > > loaded
> > > > >> into memory.  I think his concern is every time all the store
> files
> > > are
> > > > >> read and then we load it into memory and wants some faster ways
of
> > > doing
> > > > >> it.
> > > > >> Asaf you are right.
> > > > >>
> > > > >> Regards
> > > > >> Ram
> > > > >>
> > > > >>
> > > > >> On Wed, Jun 5, 2013 at 11:22 AM, Asaf Mesika <
> asaf.mesika@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > When you do the first read of this region, wouldn't this
load
> all
> > > > bloom
> > > > >> > filters?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan <
> > > > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >> >
> > > > >> > > for the question whether you will be able to do a warm
up for
> > the
> > > > >> bloom
> > > > >> > and
> > > > >> > > block cache i don't think it is possible now.
> > > > >> > >
> > > > >> > > Regards
> > > > >> > > Ram
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika <
> > > asaf.mesika@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > If you will read HFile v2 document on HBase site
you will
> > > > understand
> > > > >> > > > completely how the search for a record works and
why there
> is
> > > > linear
> > > > >> > > search
> > > > >> > > > in the block but binary search to get to the right
block.
> > > > >> > > > Also bear in mind the amount of keys in a blocks
is not big
> > > since
> > > > a
> > > > >> > block
> > > > >> > > > in HFile by default is 65k, thus from a 10GB HFile
you are
> > only
> > > > >> fully
> > > > >> > > > scanning 65k out of it.
> > > > >> > > >
> > > > >> > > > On Wednesday, June 5, 2013, Pankaj Gupta wrote:
> > > > >> > > >
> > > > >> > > > > Thanks for the replies. I'll take a look
at
> > > > >> src/main/java/org/apache/
> > > > >> > > > > hadoop/hbase/coprocessor/BaseRegionObserver.java.
> > > > >> > > > >
> > > > >> > > > > @ramkrishna: I do want to have bloom filter
and block
> index
> > > all
> > > > >> the
> > > > >> > > time.
> > > > >> > > > > For good read performance they're critical
in my workflow.
> > The
> > > > >> worry
> > > > >> > is
> > > > >> > > > > that when HBase is restarted it will take
a long time for
> > them
> > > > to
> > > > >> get
> > > > >> > > > > populated again and performance will suffer.
If there was
> a
> > > way
> > > > of
> > > > >> > > > loading
> > > > >> > > > > them quickly and warm up the table then we'll
be able to
> > > restart
> > > > >> > HBase
> > > > >> > > > > without causing slow down in processing.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > > >
> > > > >> > > > > > bq. But i am not very sure if we can
control the files
> > > getting
> > > > >> > > selected
> > > > >> > > > > for
> > > > >> > > > > > compaction in the older verisons.
> > > > >> > > > > >
> > > > >> > > > > > Same mechanism is available in 0.94
> > > > >> > > > > >
> > > > >> > > > > > Take a look
> > > > >> > > > > > at
> > > > >> > > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > >
> > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
> > > > >> > > > > > where you would find the following methods
(and more):
> > > > >> > > > > >
> > > > >> > > > > >   public void preCompactSelection(final
> > > > >> > > > > > ObserverContext<RegionCoprocessorEnvironment>
c,
> > > > >> > > > > >       final Store store, final List<StoreFile>
> candidates,
> > > > final
> > > > >> > > > > > CompactionRequest request)
> > > > >> > > > > >   public InternalScanner
> > > > >> > > > > > preCompact(ObserverContext<RegionCoprocessorEnvironment>
> > e,
> > > > >> > > > > >       final Store store, final InternalScanner
scanner)
> > > throws
> > > > >> > > > > IOException
> > > > >> > > > > > {
> > > > >> > > > > >
> > > > >> > > > > > Cheers
> > > > >> > > > > >
> > > > >> > > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna
vasudevan <
> > > > >> > > > > > ramkrishna.s.vasudevan@gmail.com>
wrote:
> > > > >> > > > > >
> > > > >> > > > > > > >>Does Minor compaction remove
HFiles in which all
> > entries
> > > > are
> > > > >> > out
> > > > >> > > of
> > > > >> > > > > > >    TTL or does only Major compaction
do that
> > > > >> > > > > > > Yes it applies for Minor compactions.
> > > > >> > > > > > > >>Is there a way of configuring
major compaction to
> > > compact
> > > > >> only
> > > > >> > > > files
> > > > >> > > > > > >    older than a certain time or
to compress all the
> > files
> > > > >> except
> > > > >> > > the
> > > > >> > > > > > latest
> > > > >> > > > > > >    few?
> > > > >> > > > > > > In the latest trunk version the
compaction algo itself
> > can
> > > > be
> > > > >> > > > plugged.
> > > > >> > > > > > >  There are some coprocessor hooks
that gives control
> on
> > > the
> > > > >> > scanner
> > > > >> > > > > that
> > > > >> > > > > > > gets created for compaction with
which we can control
> > the
> > > > KVs
> > > > >> > being
> > > > >> > > > > > > selected. But i am not very sure
if we can control the
> > > files
> > > > >> > > getting
> > > > >> > > > > > > selected for compaction in the
older verisons.
> > > > >> > > > > > > >> The above excerpt seems
to imply to me that the
> > search
> > > > for
> > > > >> key
> > > > >> > > > > inside
> > > > >> > > > > > a
> > > > >> > > > > > > block
> > > > >> > > > > > > is linear and I feel I must be
reading it wrong. I
> would
> > > > >> expect
> > > > >> > the
> > > > >> > > > > scan
> > > > >> > > > > > to
> > > > >> > > > > > > be a binary search.
> > > > >> > > > > > > Once the data block is identified
for a key, we seek
> to
> > > the
> > > > >> > > beginning
> > > > >> > > > > of
> > > > >> > > > > > > the block and then do a linear
search until we reach
> the
> > > > exact
> > > > >> > key
> > > > >> > > > that
> > > > >> > > > > > we
> > > > >> > > > > > > are looking out for.  Because internally
the data
> (KVs)
> > > are
> > > > >> > stored
> > > > >> > > as
> > > > >> > > > > > byte
> > > > >> > > > > > > buffers per block and it follows
this pattern
> > > > >> > > > > > > <keylength><valuelength><keybytearray><valuebytearray>
> > > > >> > > > > > > >>Is there a way to warm
up the bloom filter and block
> > > index
> > > > >> > cache
> > > > >> > > > for
> > > > >> > > > > > >    a table?
> > > > >> > > > > > > You always want the bloom and block
index to be in
> > cache?
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Wed, Jun 5, 2013 at 7:45 AM,
Pankaj Gupta <
> > > > >> > > pankaj@brightroll.com>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hi,
> > > > >> > > > > > > >
> > > > >> > > > > > > > I have a few small questions
regarding HBase. I've
> > > > searched
> > > > >> the
> > > > >> > > > forum
> > > > >> > > > > > but
> > > > >> > > > > > > > couldn't find clear answers
hence asking them here:
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >    1. Does Minor compaction
remove HFiles in which
> all
> > > > >> entries
> > > > >> > > are
> > > > >> > > > > out
> > > > >> > > > > > of
> > > > >> > > > > > > >    TTL or does only Major
compaction do that? I
> found
> > > this
> > > > >> > jira:
> > > > >> > > > > > > >
> > https://issues.apache.org/jira/browse/HBASE-5199but I
>  > > > >> > dont'
> > > > >> > > > know
> > > > >> > > > > > if
> > > > >> > > > > > > > the
> > > > >> > > > > > > >    compaction being talked
about there is minor or
> > > major.
> > > > >> > > > > > > >    2. Is there a way of configuring
major compaction
> > to
> > > > >> compact
> > > > >> > > > only
> > > > >> > > > > > > files
> > > > >> > > > > > > >    older than a certain time
or to compress all the
> > > files
> > > > >> > except
> > > > >> > > > the
> > > > >> > > > > > > latest
> > > > >> > > > > > > >    few? We basically want
to use the time based
> > > filtering
> > > > >> > > > > optimization
> > > > >> > > > > > in
> > > > >> > > > > > > >    HBase to get the latest
additions to the table
> and
> > > > since
> > > > >> > major
> > > > >> > > > > > > > compaction
> > > > >> > > > > > > >    bunches everything into
one file, it would defeat
> > the
> > > > >> > > > > optimization.
> > > > >> > > > > > > >    3. Is there a way to warm
up the bloom filter and
> > > block
> > > > >> > index
> > > > >> > > > > cache
> > > > >> > > > > > > for
> > > > >> > > > > > > >    a table? This is for a
case where I always want
> the
> > > > bloom
> > > > >> > > > filters
> > > > >> > > > > > and
> > > > >> > > > > > > > index
> > > > >> > > > > > > >    to be all in memory, but
not the
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
> > > > pankaj@brightroll.com
> > > > >
> > > > > Pankaj Gupta | Software Engineer
> > > > >
> > > > > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
> > > > >
> > > > >
> > > > > United States | Canada | United Kingdom | Germany
> > > > >
> > > > >
> > > > > We're hiring<
> > > >
> > >
> >
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> > > > >
> > > > > !
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
> > > pankaj@brightroll.com
> > > >
> > > > Pankaj Gupta | Software Engineer
> > > >
> > > > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
> > > >
> > > >
> > > > United States | Canada | United Kingdom | Germany
> > > >
> > > >
> > > > We're hiring<
> > > >
> > >
> >
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> > > > >
> > > > !
> > > >
> > >
> >
>
>
>
> --
>
>
> *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pankaj@brightroll.com
>
> Pankaj Gupta | Software Engineer
>
> *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
>
>
> United States | Canada | United Kingdom | Germany
>
>
> We're hiring<
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> >
> !
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message