phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ravi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build
Date Mon, 10 Aug 2015 02:24:45 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679480#comment-14679480
] 

ravi commented on PHOENIX-2154:
-------------------------------

Right now, the job does both the tasks for generating the HFiles and then loading them onto
the target table.   Should we try to break it into two process where 
1) The job runs the HFiles generation code. We run it with a submit() rather than waitForCompletion().
This way, the client returns immediately 
2) Once the job finishes, the client runs the org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
job to load the HFiles onto the table. 

Currently, the mapper output runs through a KeyValueSortReducer , a Reducer class that is
responsible to write the output in HFile format.  To keep the state across jobs(when failures
happen), we will have to write the map output to HDFS and then run a sub sequent job that
loads the previous map output and write to HFiles through the KeyValueSortReducer.  Not sure
if we wanted to travel this path.



 


> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2154
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done in the event
of the failure of one of the other mappers. The initial population of an index is based on
a snapshot in time, so new rows getting *after* the index build has started and/or failed
do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so there's really
no need to dedup. However, the index rows will have a different row key than the data table,
so I'm not sure how the HFiles are split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message