phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build
Date Mon, 17 Aug 2015 23:06:45 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700396#comment-14700396
] 

James Taylor commented on PHOENIX-2154:
---------------------------------------

One simple way that [~samarthjain] come up with of marking the index as active once all the
mappers are complete is to do this in the reduce phase. Since we don't need a reducer at all
when we're using the regular HBase APIs, we can configure our MR index builder job to have
a single reducer that simply marks the index as active.

Does that make sense, [~maghamravikiran], [~gabriel.reid], [~tdsilva]?

> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2154
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: IndexTool.java
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done in the event
of the failure of one of the other mappers. The initial population of an index is based on
a snapshot in time, so new rows getting *after* the index build has started and/or failed
do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so there's really
no need to dedup. However, the index rows will have a different row key than the data table,
so I'm not sure how the HFiles are split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message