phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] BinShi-SecularBird edited a comment on issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR integration on…
Date Fri, 11 Jan 2019 18:27:40 GMT
BinShi-SecularBird edited a comment on issue #419: PHOENIX-4009 Run UPDATE STATISTICS command
by using MR integration on…
URL: https://github.com/apache/phoenix/pull/419#issuecomment-453271634
 
 
   > > this is a common case - we always have some jobs failing after they finish most
of work (updated > 90% regions) but fail due to some reason, the retry job should just
do minimal work instead of update stats of the whole table again.
   > 
   > This should not be common case. Common case should be that few mappers can fail due
to some reason. We have retries for that. The job should NOT fail as a whole. Even MR framework
doesn't persist data between jobs and can get really tricky depending on use cases. Better
way is to make the job idempotent so that re-run doesn't affect it.
   > 
   > Also it is hard in this case since we don't know what changed between retries. If
snapshot name changed, that can potentially affect region boundaries. We would need another
level of orchestration that persists stats MR job information in some table and we look it
up before running the current job. These cases would be difficult to handle. I would prefer
that we try to avoid that complexity here.
   
   There are always some jobs (might be a few in a day) failing, because few mappers in a
job continuously failing and even retries can't get over the issue which causes the whole
job to fail -- this is the case I'm talking about, and it happens more frequently when some
bad thing happen in the cluster. In this case, I want the retry job to skip the regions whose
stats have already been updated and only do minimal work, so it wouldn't worsen the bad situation
in the cluster and we can easily catch up to avoid missing SLA. In our clusters, if phoenix
MR jobs failure rate is low, I'm ok with the current approach.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message