phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] BinShi-SecularBird commented on issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR integration on…
Date Thu, 10 Jan 2019 00:33:59 GMT
BinShi-SecularBird commented on issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using
MR integration on…
URL: https://github.com/apache/phoenix/pull/419#issuecomment-452925496
 
 
   > > > I didn't see my concern "if time since last update is less than certain
threshold, the job doesn't need to update the regions' stats again. Suppose one MR job failed
but some regions still get updated, the rerun job only needs to update the regions that the
first job didn't update." described in the PHOENIX-4009 being addressed in this change, and
I still think it should be addressed.
   > > 
   > > 
   > > The first aspect should be fairly simple to add. We can update it as part of
this Jira or the PHOENIX-5091.
   > > The second aspect of re-running only the necessary tasks is bit complicated.
The mappers would retry for particular region when they fail (upto limit of max attempts).
However I dont really feel the need for that optimization as of now. Let me know what your
thoughts are.
   > 
   > I didn't mean "The second aspect of re-running only the necessary tasks". What I mean
is -- assume the first MR job scheduled 100 mappers, 90 of them succeeded and 10 them failed,
so the first job successfully updated 90 regions but the whole job failed. The second job
(the retry job, assume it succeed) still schedule 100 mappers but only 10 mapper should actually
update the stats on the 10 regions which failed in the first job and the other 90 mapper should
skip stats update and succeed.
   
   this is a common case - we always have some jobs failing after they finish most of work
(updated > 90% regions) but fail due to some reason, the retry job should just do minimal
work instead of update stats of the whole table again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message