hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Mains <andrew.ma...@kontagent.com>
Subject Re: Can TableSnapshotInputFormat support multiple snapshots as the MR input?
Date Fri, 22 May 2015 10:06:09 GMT
In the latest release, no; however I've filed a ticket here 
https://issues.apache.org/jira/browse/HBASE-13356 for this feature, and 
uploaded a patch for review.

The patch provides a MultiTableSnapshotInputFormat which can run a list 
of scans over multiple snapshots. Jobs can be initialized using:

  public static void initMultiTableSnapshotMapperJob(Map<String, Collection<Scan>>
snapshotScans,
      Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?>
outputValueClass,
       Job job, boolean addDependencyJars, Path tmpRestoreDir) throws IOException {


Hope this helps!

Andrew

On 5/22/15 2:35 AM, Shi, Shaofeng wrote:
> Hello,
>
> We have a scenario which need merge multiple Hbase tables into one table periodically;
To gain better performance and minimal the impact to HBase server, we are evaluating the method
of using TableSnapshotInputFormat (http://www.slideshare.net/enissoz/mapreduce-over-snapshots);
But from the API we see it only allows one snapshot as input; Is it possible to change it
to allow multiple snapshots?
>
> Thanks in advance for any advise;
>
> Shaofeng Shi
> Apache Kylin
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message