hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "deepankar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11484) Provide a way in TableSnapshotInputFormat, not to restore the regions to a path for running MR every time, rather reuse a already restored path
Date Wed, 09 Jul 2014 19:41:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056646#comment-14056646
] 

deepankar commented on HBASE-11484:
-----------------------------------

Thanks for the suggestions.Yes, I know Restore is relatively cheap, but since Hive does not
support (as far as I my knowledge goes) any cleanup hook , we have to cleanup all the restored
Dirs, Rather we were thinking if we can restore once, after exporting the snapshot (ExportSnapshot
job)  and then use that for all the further queries on this snapshot. 
[~mbertozzi] Thanks for the idea of the ClientSideRegionScanner, It a very good idea, but
that class is not there in 0.94 and unfortunately we still use 0.94, and also I have to manage
creating the splits (which TableSnapshotIF takes care of).
[~enis] For the sake of ensuring only I will write a function for verifying that for each
reference in the snapshot, there is a HFileLink or HFile
present in the restoreDir

> Provide a way in TableSnapshotInputFormat, not to restore the regions to a path for running
MR every time, rather reuse a already restored path
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11484
>                 URL: https://issues.apache.org/jira/browse/HBASE-11484
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>            Reporter: deepankar
>            Priority: Minor
>
> We are trying to back a Hive Table by the Map Reduce over snapshots  and we don't want
to restore the snapshot to a restoreDir every time we want to execute a query. It would be
nice if there is boolean in the function 
> * TableSnapshotInputFormat.setInput * and exposed outside in the
> * TableMapReduceUtil.initTableSnapshotMapperJob *, with this boolean
> it will check whether the snapshot and the restore dir are in sync, rather than restoring
again. 
> Is this Idea looks Ok to you guys or you have any other suggestions, I will put up a
patch for this if this idea is ok for guys



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message