hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Different time ranges for different cfs when using TableInputFormat
Date Wed, 04 Mar 2015 18:23:39 GMT
Have a look at the versions of TableMapReduceUtil#initTableMapperJob that
take a List<Scan> instances. Does that provide what you're looking for?


On Wed, Mar 4, 2015 at 6:05 AM, Dave Latham <latham@davelink.net> wrote:

> That's not possible with HBase today.  The simplest thing may be to set
> your Scan time range to include both today's and yesterday's data and then
> filter down to only the data you want inside your map task.  Other
> possibilities would be creating a custom filter to do the filtering on the
> server side or even changing your input format or map task to run two
> concurrent scans with different familes/time ranges and merging the
> results.
> Being able to specify different time ranges for different column families
> is something I'd like to do as well.  Perhaps we'll get that into HBase at
> some point.
> Dave
> On Tue, Mar 3, 2015 at 5:23 PM, Felipe Sodré Silva <fsodre@gmail.com>
> wrote:
> > When using TableInputFormat to make HBase data available to map/reduce
> > jobs we can use the settings SCAN_TIMERANGE_START and
> > SCAN_TIMERANGE_END to specify a time range during scan.
> > Is it possible to somehow have different time ranges for different
> > column families?
> >
> > This is my problem:
> > I have table X with column families cf1, cf2 and cf3. I want to run a
> > map/reduce job on it using the most recent versions of columns in cf1
> > and cf2, but I want to use yesterday's data from cf3. Is this
> > possible?
> >
> > Felipe
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message