nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Tomblin <ptomb...@xcski.com>
Subject Is this a bug?
Date Mon, 10 Aug 2009 20:27:16 GMT
I was wondering why nutch was refetching pages that haven't changed in
a decade, when I discovered this code in
org.apache.nutch.protocol.http.HttpResponse.java:

      if (datum.getModifiedTime() > 0) {
        reqStr.append("If-Modified-Since: " +
HttpDateFormat.toString(datum.getModifiedTime()));
        reqStr.append("\r\n");
      }

Shouldn't it be sending the time of the last fetch rather than the
last modification?  At least in my testing, "last modification" is 0,
but last fetch is correctly set.  Maybe that should be

if (datum.getModifiedTime() > 0) {
   reqStr.append("If-Modified-Since:" +
HttpDateFormat.toString(datum.getModifiedTime());
   reqStr.append("\r\n");
} else if (datum.getFetchTime() > 0) {
   reqStr.append("If-Modified-Since:" +
HttpDateFormat.toString(datum.getFetchTime());
   reqStr.append("\r\n");
}

-- 
http://www.linkedin.com/in/paultomblin

Mime
View raw message