Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 39DAF200C41 for ; Fri, 24 Mar 2017 20:42:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 38319160B93; Fri, 24 Mar 2017 19:42:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 31360160B75 for ; Fri, 24 Mar 2017 20:42:29 +0100 (CET) Received: (qmail 83776 invoked by uid 500); 24 Mar 2017 19:42:27 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 83764 invoked by uid 99); 24 Mar 2017 19:42:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Mar 2017 19:42:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 385681A070B for ; Fri, 24 Mar 2017 19:42:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.694 X-Spam-Level: *** X-Spam-Status: No, score=3.694 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_HEX=1.313, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id lRCeLKa0BTS8 for ; Fri, 24 Mar 2017 19:42:23 +0000 (UTC) Received: from mail-ot0-f170.google.com (mail-ot0-f170.google.com [74.125.82.170]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 621445FB64 for ; Fri, 24 Mar 2017 19:42:23 +0000 (UTC) Received: by mail-ot0-f170.google.com with SMTP id a12so99196ota.0 for ; Fri, 24 Mar 2017 12:42:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=4sX71+DLHTrI2t7yeUM7C9bmR5uSQpzSjbDOhW4SRNs=; b=oYZRUG688nyD9jckt0LQV5g+vexNgVJrJJ/DhdJ3AZnxL+U9nt3IMl8Vzf+xXd9y5i 22LC9uv4hPEK6m9upLvkfwxNem7t3cFl1A5Q7myvtLoHug+RiheW19QoGx1Moseyf/vO qXGHR9Wy9E7rD/sAlmx+MQLQ7z8R2YlIIM03aXoNXmA1eeeRCBDCJLP4qZcsSowoTybQ 1lNsCUWsL5kSFg8Cn2ijGIgNuccsOSv54nm7DBb2ukvGRItJouT7y5UtZoJkH+G+3ctv pVbBTonyCUJ3jvZhHYmMv980TPVOnr5beHdjulMcQHD+g8pJstwiUHqqfnBZtbwO6Sw+ CFAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=4sX71+DLHTrI2t7yeUM7C9bmR5uSQpzSjbDOhW4SRNs=; b=MWKK6rOz4sl71dUTLGY/vkMbrm9m9FieG3bxoFDGQhSwtpazLRurBoSWHwhVjjDWfL grl6UHX+4zB4of+O2wl2rldmOjDk2FydjUA+MyHMvLrSXcaOrrTy6zo9OckEUJ2lAC3n +hProeMJguRdjYns+7B8k/IoeWlGUg4ijRvYu59ZYQPRJikePWH3tl+TpZKxaBk0QE2F q+/GMUgOwPZP7G2NI8ix+h/0nxeBfIZ6dIwf4fIw0CoOKa8Ye4I4hLGja8vdgQZtgjEd aQOUqnv9tHs2K0OiGTNnPOuQ/UNPYnpxO0rsdoQII51zG0g4McvMV0UKA9AbQh2lb+JH +sZg== X-Gm-Message-State: AFeK/H0tKtr9/7DLE+jEwMiAWA+b0gJNKsFZ95v+/be6LWKYMdw3blJ2h6QaiKWPLJ1iUVuqLsE3iDlRfc2dfw== X-Received: by 10.157.45.163 with SMTP id g32mr5777472otb.274.1490384542632; Fri, 24 Mar 2017 12:42:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.175.143 with HTTP; Fri, 24 Mar 2017 12:42:21 -0700 (PDT) Received: by 10.202.175.143 with HTTP; Fri, 24 Mar 2017 12:42:21 -0700 (PDT) In-Reply-To: References: From: Jorge Alpedrinha Ramos Date: Fri, 24 Mar 2017 19:42:21 +0000 Message-ID: Subject: Re: Logs And Docker To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary=94eb2c04fb4a3e195c054b7f316c archived-at: Fri, 24 Mar 2017 19:42:30 -0000 --94eb2c04fb4a3e195c054b7f316c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable If you are not running workers that means it is trying to access it's own logs service (part of airflow) but may not be able to either resolve the address (30751ef88b7a ) or you're having the same problem I was mentioning. I'm assuming this from the line: > > > *** Fetching here: > > > http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:0 This line states that your worker is trying to fetch logs from 30751ef88b7a , that must be your webserver container id. Hope that at least I've driven you in a good direction to solve the problem. On 24 Mar 2017 5:56 p.m., "Nicholas Hodgkinson" < nik.hodgkinson@collectivehealth.com> wrote: So first off, my particular setup isn't using celery at all; this is just a scheduler and a web server in separate docker containers; there are no workers in the mix. The log says that the requested log isn't local, however the logs are mounted to the same path on both machines. This makes me wonder how airflow is determining that the log is or isn't local. I need to look through the responses a little more closely, but that's what I'm honking right now. On Fri, Mar 24, 2017 at 4:03 AM Jorge Alpedrinha Ramos < jalpedrinharamos@gmail.com> wrote: > Found that this may be related: > https://github.com/kennethreitz/requests/issues/2422 > > On Fri, Mar 24, 2017 at 10:36 AM Jorge Alpedrinha Ramos < > jalpedrinharamos@gmail.com> wrote: > > > I just read the stacktrace with more attention and celery is not the > > culprit, I thought that logs were served by celery but it's the > serve_logs > > app from airflow that is responsible for this. This sounds like some > > configuration on the max buffer size for responses, but no being an > expert > > in flask this is just a wild guess. > > > > Nicholas is this the behavior you're seing? If this isn't the case you > may > > have a situation where your webserver container is not able to comunicate > > with the worker. > > > > As promised here is the stacktrace: > > > > worker_1 | [2017-03-24 10:29:39,574] {_internal.py:87} INFO - > > 172.18.0.8 - - [24/Mar/2017 10:29:39] "GET > > /log/rates.ticker-to-analytics/parse-syslog/2017-01-12T07:45:00 HTTP/1.1" > > 200 - > > worker_1 | Traceback (most recent call last): > > worker_1 | File "/usr/local/bin/airflow", line 28, in > > worker_1 | args.func(args) > > worker_1 | File > > "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 858, in > > serve_logs > > worker_1 | host=3D'0.0.0.0', port=3DWORKER_LOG_SERVER_PORT) > > worker_1 | File > > "/usr/local/lib/python2.7/site-packages/flask/app.py", line 843, in run > > worker_1 | run_simple(host, port, self, **options) > > worker_1 | File > > "/usr/local/lib/python2.7/site-packages/werkzeug/serving.py", line 736, > in > > run_simple > > worker_1 | inner() > > worker_1 | File > > "/usr/local/lib/python2.7/site-packages/werkzeug/serving.py", line 699, > in > > inner > > worker_1 | srv.serve_forever() > > worker_1 | File > > "/usr/local/lib/python2.7/site-packages/werkzeug/serving.py", line 536, > in > > serve_forever > > worker_1 | HTTPServer.serve_forever(self) > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 233, in serve_forever > > worker_1 | self._handle_request_noblock() > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 292, in _handle_request_noblock > > worker_1 | self.handle_error(request, client_address) > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 290, in _handle_request_noblock > > worker_1 | self.process_request(request, client_address) > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 318, in process_request > > worker_1 | self.finish_request(request, client_address) > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 331, in finish_request > > worker_1 | self.RequestHandlerClass(request, client_address= , > > self) > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 654, in __init__ > > worker_1 | self.finish() > > worker_1 | File "/usr/local/lib/python2.7/SocketServer.py", > line > > 713, in finish > > worker_1 | self.wfile.close() > > worker_1 | File "/usr/local/lib/python2.7/socket.py", line 283, > > in close > > worker_1 | self.flush() > > worker_1 | File "/usr/local/lib/python2.7/socket.py", line 307, > > in flush > > worker_1 | > > self._sock.sendall(view[write_offset:write_offset+buffer_size]) > > worker_1 | socket.error: [Errno 32] Broken pipe > > > > On Fri, Mar 24, 2017 at 10:27 AM Jorge Alpedrinha Ramos < > > jalpedrinharamos@gmail.com> wrote: > > > > Hi, > > > > I have run into this on a specific scenario where the log file is huge > and > > the worker throws some error when responding to the webserver. You'll > > probably see some issue with writing to buffer if you check the worker > > logs. I'll send a second email with a stacktrace of this, I believe thi= s > is > > somehow related to celery, but I'll need to do some more digging. > > > > Another thing that could be a solution would be to make airflow webserver > > to use the remote logs location for fetching the log instead of always > > relying on the worker (which may be decommissioned, after a scaling > > operation for handling a spike, and no longer available). > > > > On Fri, Mar 24, 2017 at 9:37 AM Gael Magnan > wrote: > > > > Hi, > > > > we have encountered the same problem on some machines but not all of > them. > > > > One of our developer can't access log on his mac since the move to > airflow > > 1.8.0, but our production machine on Ubuntu doesn't have the problem. > > For him it seems like the log file name the UI try to access is not the > one > > the worker created (IE different date format). > > > > We haven't found out the cause of the problem so if you find something > I'm > > interested. > > > > Regards > > Gael > > > > Le ven. 24 mars 2017 =C3=A0 01:26, Nicholas Hodgkinson < > > nik.hodgkinson@collectivehealth.com> a =C3=A9crit : > > > > > So I'm running my scheduler and webserver in different Docker container > > on > > > the same host, everything seems to be working fine with the exception > of > > > access to logs from the UI. When doing so I get this: > > > > > > *** Log file isn't local. > > > *** Fetching here: > > > http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:00 > > > *** Failed to fetch log file from worker. > > > > > > *** Reading remote logs... > > > *** Unsupported remote log location. > > > > > > However both containers have the same log directory mounted as a volume > > > inside the container, which is specified correctly as a environment > > > variable. Resources on this problem are scarce and I'm not sure how t= o > > > solve it. Thoughts? > > > > > > -Nik > > > nik.hodgkinson@collectivehealth.com > > > > > > -- > > > > > > > > > Read our founder's story. > > > > > > > > > *This message may contain confidential, proprietary, or protected > > > information. If you are not the intended recipient, you may not > review, > > > copy, or distribute this message. If you received this message in > error, > > > please notify the sender by reply email and delete this message.* > > > > > > > > -- -N nik.hodgkinson@collectivehealth.com (913) 927-4891 -- Read our founder's story. *This message may contain confidential, proprietary, or protected information. If you are not the intended recipient, you may not review, copy, or distribute this message. If you received this message in error, please notify the sender by reply email and delete this message.* --94eb2c04fb4a3e195c054b7f316c--