giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta" <alessan...@fb.com>
Subject Re: Review Request: Fix log messages produced by aggregators
Date Wed, 20 Mar 2013 00:13:56 GMT


> On March 19, 2013, 10:08 p.m., Alessandro Presta wrote:
> > Thanks for taking the time to revise this code.
> > Can you please post a sample log?
> > The main thing I'm interested in is telling the user what we're waiting on (we're
waiting on certain machines to send certain numbers of requests relative to aggregators).
> > Another issue is that this mechanism obscures which machines are actually ready
and which ones are still working. It would be good to have each worker print which workers
it's waiting on, when they fall under a certain number (like we do with other requests), since
we can't see that in the master status anymore.
> 
> Maja Kabiljo wrote:
>     I understood that, it's what this patch is providing :-)
>     Here is the log example:
>     
>     2013-03-19 14:46:09,795 [main] org.apache.giraph.utils.TaskIdsPermitsBarrier  - waitForRequiredPermits:
Waiting for 10 more tasks to send their data, task ids: [1, 4, 5, 9, 11, 13, 16, 25, 26, 28]
>     
>     I set the limit for logging the exact ids to 10.
>     No in the complicated case when we have a lot of aggregator requests, and if each
worker sent at least something, we still won't know what workers are we waiting for. But in
the practice I never saw this happening for longer than a few seconds, so I don't think it's
a big deal. If there are some workers doing other stuff then we are not going to receive any
aggregator request from them, and will see this kind of log.
> 
> Alessandro Presta wrote:
>     Sorry but it was difficult to see exactly what changed, since it shows it as a completely
new file...
>     The log message looks better now. Is there a way we can wrap it into something that
refers to aggregators?
>     As I understand it, now we log the missing task ids if we haven't received anything
from them. I guess that's ok for most applications (where there are no or just a few aggregators),
but seems problematic in case the application actually uses aggregators heavily. If it's a
simple change, I would make it so that we track workers that either have sent no information
at all, or only part of it. Anyway, this already solves my main issue with troubleshooting
jobs.
> 
> Maja Kabiljo wrote:
>     Yeah, I don't know why the patch didn't realize the file was just renamed.
>     I could change the log line to say that we are waiting for aggregators, but I don't
think it's a good idea since it's misleading - as you know in all the cases when we see lines
about waiting for permits, it's nothing related to aggregators, but it's like that just because
aggregators are last step in the superstep. I already see user mailing list full of questions
like "I'm not using aggregators but my application waits on aggregators..." :-) What do others
think?
>     I'm not sure how complicated it would be to make this print nice logs when we received
something but not everything from a worker, if we find a need for it in the future I can work
on it. Aggregator requests are by default limited to 1MB, so if your aggregator data is less
than numOfWorkers * 1MB there won't be multiple requests. And, as I said, even in our application
which heavily uses aggregators, we still very rarely come in this situation, and it takes
very short time.

If it's not related to aggregators, then what is it related to? I think the problem is that
we have these log lines even when there are 0 aggregators (which, as you say, can be confusing).
"Permits" is no less misleading than "aggregators", except the user probably doesn't know
what the former means. It would be better if we could skip this whole permits mechanism when
there are no aggregators.
I'm ok with the current logic for logging task ids, but I still think the log message should
be more transparent.
Anyway up to you. Maybe someone else can comment on this...


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10029/#review18105
-----------------------------------------------------------


On March 19, 2013, 10:35 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10029/
> -----------------------------------------------------------
> 
> (Updated March 19, 2013, 10:35 p.m.)
> 
> 
> Review request for giraph and Alessandro Presta.
> 
> 
> Description
> -------
> 
> Now we'll be writing which workers are we waiting for.
> 
> 
> This addresses bug GIRAPH-537.
>     https://issues.apache.org/jira/browse/GIRAPH-537
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/comm/aggregators/AllAggregatorServerData.java
dddd1cb 
>   giraph-core/src/main/java/org/apache/giraph/comm/aggregators/OwnerAggregatorServerData.java
70ff7fe 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java 86ea8dc

>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerAggregatorRequestProcessor.java
cd24219 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/ByteArrayWithSenderTaskIdRequest.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/SendAggregatorsToOwnerRequest.java
21b1b2d 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/SendAggregatorsToWorkerRequest.java
7e84e17 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerAggregatorsRequest.java
264f03a 
>   giraph-core/src/main/java/org/apache/giraph/utils/ExpectedBarrier.java ccd137c 
>   giraph-core/src/main/java/org/apache/giraph/utils/TaskIdsPermitsBarrier.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerAggregatorHandler.java 001cf59

> 
> Diff: https://reviews.apache.org/r/10029/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> AggregatorsBenchmark
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message