www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Ruby (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (INFRA-9865) Planet not loading some RSS feeds due to bad cache / redirect loop
Date Mon, 31 Aug 2015 15:50:45 GMT

    [ https://issues.apache.org/jira/browse/INFRA-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723582#comment-14723582

Sam Ruby commented on INFRA-9865:

First, a HTTP 301 redirect indicates a permanent redirect (302 is for temporary redirects).
 So, yes, Venus does retain ("cache") this information.

Instead of trying to debug this deeper, my first suggestion would be to wipe the relevant
subdirectory in minotaur:/home/applanet/planet/cache.  Other than a one time refetch of all
feeds, there should be no negative effects of wiping the cache.  Only possible exception that
I recall is that feeds that don't contain valid timestamps may be treated as having been updated
so will briefly bubble back to the top of the page, but will from that point work their way
back down into obscurity.

> Planet not loading some RSS feeds due to bad cache / redirect loop
> ------------------------------------------------------------------
>                 Key: INFRA-9865
>                 URL: https://issues.apache.org/jira/browse/INFRA-9865
>             Project: Infrastructure
>          Issue Type: Bug
>            Reporter: Roger Ignazio
>            Priority: Minor
> Greetings!
> I'm creating this ticket as a follow-up of a conversation between myself, Gavin McDonald,
and [~cml] on the infrastructure@ mailing list.
> Several weeks back, I enabled SSL for my blog and created 301 redirects from http to
https. I noticed that Planet Mesos wasn't being updated because of a TLSv1 hello problem with
Cloudflare's free SSL certificates. Around the same time, I also realized that the {{atom:link}}
tag in my XML feed was incorrect. Both of those problems have now been resolved.
> It appears that, in my case, I've run into an issue with how Venus (Planet) has cached
URLs for its feeds: it doesn't query the feed URL present in the configuration, but instead
caches the URL for later use. This caching is also updated when (and only when) a blog/site
sends back a 301 redirect to an updated location.
> Some mix of stale/bad/inconsistent cache is causing Venus to fail with the following:
> {noformat}
> 2015-06-22 22:06:24,135 ERROR   File "/x1/home/applanet/git/venus/planet/spider.py",
line 326, in httpThread
>     (resp, content) = h.request(idna, 'GET', headers=headers)
> 2015-06-22 22:06:24,135 ERROR   File "/x1/home/applanet/git/venus/planet/vendor/httplib2/__init__.py",
line 1041, in request
>     (response, new_content) = self.request(info['-x-permanent-redirect-url'], "GET",
headers = headers, redirections = redirections - 1)
> {noformat}
> For what it's worth, we also see this problem for at least one other feed on Planet Apache
(committers): http://planet.apache.org/committers/venus.log
> I've tried unsuccessfully to reproduce this locally with a clean install of Venus, but
it works as ecpected. https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Frogerignazio.com%2Fblog%2Fcategories%2Fmesos%2Ffeed.xml
reports some minor errors, but otherwise claims it’s valid.
> I've disabled _all_ redirects on my side, and made the use of SSL optional for my blog,
but I still continue to experience this problem and would appreciate it if someone could take
a look at Planet / Venus.
> Chris also suggested we might need to ping [~rubys] to dig into Venus a bit deeper, or
at least to provide guidance on how we might be able to resolve this issue. I wouldn't mind
doing some of the legwork myself if I could get a copy of the relevant cache files from Venus.

This message was sent by Atlassian JIRA

View raw message