airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Alpedrinha Ramos <jalpedrinhara...@gmail.com>
Subject Re: Logs And Docker
Date Fri, 24 Mar 2017 19:42:21 GMT
If you are not running workers that means it is trying to access it's own
logs service (part of airflow) but may not be able to either resolve the
address (30751ef88b7a
<http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:00>) or
you're having the same problem I was mentioning.

I'm assuming this from the line:

> > > *** Fetching here:
> > > http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:0
<http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:00>


This line states that your worker is trying to fetch logs from 30751ef88b7a
<http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:00>, that
must be your webserver  container id.

Hope that at least I've driven you in a good direction to solve the
problem.


On 24 Mar 2017 5:56 p.m., "Nicholas Hodgkinson" <
nik.hodgkinson@collectivehealth.com> wrote:

So first off, my particular setup isn't using celery at all; this is just a
scheduler and a web server in separate docker containers; there are no
workers in the mix. The log says that the requested log isn't local,
however the logs are mounted to the same path on both machines.

This makes me wonder how airflow is determining that the log is or isn't
local.

I need to look through the responses a little more closely, but that's what
I'm honking right now.



On Fri, Mar 24, 2017 at 4:03 AM Jorge Alpedrinha Ramos <
jalpedrinharamos@gmail.com> wrote:

> Found that this may be related:
> https://github.com/kennethreitz/requests/issues/2422
>
> On Fri, Mar 24, 2017 at 10:36 AM Jorge Alpedrinha Ramos <
> jalpedrinharamos@gmail.com> wrote:
>
> > I just read the stacktrace with more attention and celery is not the
> > culprit, I thought that logs were served by celery but it's the
> serve_logs
> > app from airflow that is responsible for this. This sounds like some
> > configuration on the max buffer size for responses, but no being an
> expert
> > in flask this is just a wild guess.
> >
> > Nicholas is this the behavior you're seing? If this isn't the case you
> may
> > have a situation where your webserver container is not able to
comunicate
> > with the worker.
> >
> > As promised here is the stacktrace:
> >
> > worker_1         | [2017-03-24 10:29:39,574] {_internal.py:87} INFO -
> > 172.18.0.8 - - [24/Mar/2017 10:29:39] "GET
> > /log/rates.ticker-to-analytics/parse-syslog/2017-01-12T07:45:00
HTTP/1.1"
> > 200 -
> > worker_1         | Traceback (most recent call last):
> > worker_1         |   File "/usr/local/bin/airflow", line 28, in <module>
> > worker_1         |     args.func(args)
> > worker_1         |   File
> > "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 858,
in
> > serve_logs
> > worker_1         |     host='0.0.0.0', port=WORKER_LOG_SERVER_PORT)
> > worker_1         |   File
> > "/usr/local/lib/python2.7/site-packages/flask/app.py", line 843, in run
> > worker_1         |     run_simple(host, port, self, **options)
> > worker_1         |   File
> > "/usr/local/lib/python2.7/site-packages/werkzeug/serving.py", line 736,
> in
> > run_simple
> > worker_1         |     inner()
> > worker_1         |   File
> > "/usr/local/lib/python2.7/site-packages/werkzeug/serving.py", line 699,
> in
> > inner
> > worker_1         |     srv.serve_forever()
> > worker_1         |   File
> > "/usr/local/lib/python2.7/site-packages/werkzeug/serving.py", line 536,
> in
> > serve_forever
> > worker_1         |     HTTPServer.serve_forever(self)
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 233, in serve_forever
> > worker_1         |     self._handle_request_noblock()
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 292, in _handle_request_noblock
> > worker_1         |     self.handle_error(request, client_address)
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 290, in _handle_request_noblock
> > worker_1         |     self.process_request(request, client_address)
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 318, in process_request
> > worker_1         |     self.finish_request(request, client_address)
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 331, in finish_request
> > worker_1         |     self.RequestHandlerClass(request, client_address,
> > self)
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 654, in __init__
> > worker_1         |     self.finish()
> > worker_1         |   File "/usr/local/lib/python2.7/SocketServer.py",
> line
> > 713, in finish
> > worker_1         |     self.wfile.close()
> > worker_1         |   File "/usr/local/lib/python2.7/socket.py", line
283,
> > in close
> > worker_1         |     self.flush()
> > worker_1         |   File "/usr/local/lib/python2.7/socket.py", line
307,
> > in flush
> > worker_1         |
> > self._sock.sendall(view[write_offset:write_offset+buffer_size])
> > worker_1         | socket.error: [Errno 32] Broken pipe
> >
> > On Fri, Mar 24, 2017 at 10:27 AM Jorge Alpedrinha Ramos <
> > jalpedrinharamos@gmail.com> wrote:
> >
> > Hi,
> >
> > I have run into this on a specific scenario where the log file is huge
> and
> > the worker throws some error when responding to the webserver. You'll
> > probably see some issue with writing to buffer if you check the worker
> > logs. I'll send a second email with a stacktrace of this, I believe this
> is
> > somehow related to celery, but I'll need to do some more digging.
> >
> > Another thing that could be a solution would be to make airflow
webserver
> > to use the remote logs location for fetching the log instead of always
> > relying on the worker (which may be decommissioned, after a scaling
> > operation for handling a spike, and no longer available).
> >
> > On Fri, Mar 24, 2017 at 9:37 AM Gael Magnan <gaelmagnan@gmail.com>
> wrote:
> >
> > Hi,
> >
> > we have encountered the same problem on some machines but not all of
> them.
> >
> > One of our developer can't access log on his mac since the move to
> airflow
> > 1.8.0, but our production machine on Ubuntu doesn't have the problem.
> > For him it seems like the log file name the UI try to access is not the
> one
> > the worker created (IE different date format).
> >
> > We haven't found out the cause of the problem so if you find something
> I'm
> > interested.
> >
> > Regards
> > Gael
> >
> > Le ven. 24 mars 2017 à 01:26, Nicholas Hodgkinson <
> > nik.hodgkinson@collectivehealth.com> a écrit :
> >
> > > So I'm running my scheduler and webserver in different Docker
container
> > on
> > > the same host, everything seems to be working fine with the exception
> of
> > > access to logs from the UI. When doing so I get this:
> > >
> > > *** Log file isn't local.
> > > *** Fetching here:
> > > http://30751ef88b7a:8793/log/dag-id/task-id/2017-03-23T23:00:00
> > > *** Failed to fetch log file from worker.
> > >
> > > *** Reading remote logs...
> > > *** Unsupported remote log location.
> > >
> > > However both containers have the same log directory mounted as a
volume
> > > inside the container, which is specified correctly as a environment
> > > variable. Resources on this problem are scarce and I'm not sure how to
> > > solve it. Thoughts?
> > >
> > > -Nik
> > > nik.hodgkinson@collectivehealth.com
> > >
> > > --
> > >
> > >
> > > Read our founder's story.
> > > <https://collectivehealth.com/blog/started-collective-health/>
> > >
> > > *This message may contain confidential, proprietary, or protected
> > > information.  If you are not the intended recipient, you may not
> review,
> > > copy, or distribute this message. If you received this message in
> error,
> > > please notify the sender by reply email and delete this message.*
> > >
> >
> >
>
--

-N
nik.hodgkinson@collectivehealth.com
(913) 927-4891

--


Read our founder's story.
<https://collectivehealth.com/blog/started-collective-health/>

*This message may contain confidential, proprietary, or protected
information.  If you are not the intended recipient, you may not review,
copy, or distribute this message. If you received this message in error,
please notify the sender by reply email and delete this message.*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message