hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: How to scan only Memstore from end point co-processor
Date Mon, 01 Jun 2015 06:45:42 GMT
If your scan is having a time range specified in it, HBase internally will
check this against the time range of files etc and will avoid those which
are clearly out of your interested time range.  You dont have to do any
thing for this.  Make sure you set the TimeRange for ur read

-Anoop-

On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> We have a postScannerOpen hook in the CP but that may not give you a direct
> access to know which one are the internal scanners on the Memstore and
> which one are on the store files. But this is possible but we may need to
> add some new hooks at this place where we explicitly add the internal
> scanners required for a scan.
>
> But still a general question - are you sure that your data will be only in
> the memstore and that the latest data would not have been flushed by that
> time from your memstore to the Hfiles.  I see that your scenario is write
> centric and how can you guarentee your data can be in memstore only?
> Though your time range may say it is the latest data (may be 10 to 15 min)
> but you should be able to configure your memstore flushing in such a way
> that there are no flushes happening for the latest data in that 10 to 15
> min time.  Just saying my thoughts here.
>
>
>
>
> On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah <gborah@appdynamics.com>
> wrote:
>
> > Hi all,
> >
> > Here is our use case,
> >
> > We have a very write heavy cluster. Also we run periodic end point co
> > processor based jobs that operate on the data written in the last 10-15
> > mins, every 10 minute.
> >
> > Is there a way to only query in the MemStore from the end point
> > co-processor? The periodic job scans for data using a time range. We
> would
> > like to implement a simple logic,
> >
> > a. if query time range is within MemStore's TimeRangeTracker, then query
> > only memstore.
> > b. If end Time of the query time range is within MemStore's
> > TimeRangeTracker, but query start Time is outside MemStore's
> > TimeRangeTracker (memstore flush happened), then query both MemStore and
> > Files.
> > c. If start time and end time of the query is outside of MemStore
> > TimeRangeTracker we query only files.
> >
> > The incoming data is time series and we do not allow old data (out of
> sync
> > from clock) to come into the system(HBase).
> >
> > Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan,
> > that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
> > this available in Trunk?
> >
> > Also, how do I access the Memstore for a Column Family in the end point
> > co-processor from CoprocessorEnvironment?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message