<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>nutch-dev@lucene.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/"/>
<id>http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/</id>
<updated>2009-12-09T22:27:52Z</updated>
<entry>
<title>java.net.URL synchronization</title>
<author><name>Otis Gospodnetic &lt;ogjunk-nutch@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c573860.37417.qm@web50303.mail.re2.yahoo.com%3e"/>
<id>urn:uuid:%3c573860-37417-qm@web50303-mail-re2-yahoo-com%3e</id>
<updated>2009-12-09T22:12:10Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,

Has anyone seen this:
http://www.supermind.org/blog/580/java-net-url-synchronization-bottleneck ?

Is this something that needs to be addressed in Nutch (and thus in Bixo, and thus in the common
crawler project)?


Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: State of nutchbase</title>
<author><name>=?UTF-8?B?RG/En2FjYW4gR8O8bmV5?= &lt;dogacan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3ce761bb3c0912071711i7d4fd32er78251f6047ca57e2@mail.gmail.com%3e"/>
<id>urn:uuid:%3ce761bb3c0912071711i7d4fd32er78251f6047ca57e2@mail-gmail-com%3e</id>
<updated>2009-12-08T01:11:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hey everyone,

So I restarted nutchbase efforts with adding an abstraction to the hbase
api. The idea is to use an intermediate nutch api (which then talks with
hbase) instead of communicating with hbase directly. This allows us a) to
not be completely tied down to hbase, making a move to another db in the
future easier b) perhaps to immediately support multiple databases with easy
data migration between them.

What I have is very very (VERY) early and extremely alpha but I am quite
happy with overall idea so I am sharing it for suggestions and reviews.
Again, instead of using hbase directly, nutch will use a nice java bean with
getters and setters. Nutch will then figure out what to read/write into
hbase.

I decided to use avro because it has a very clean design. Here is a  very
basic WebTableRow class:
{"namespace": "org.apache.nutch.storage",
 "protocol": "Web",

 "types": [
     {"name": "WebTableRow", "type": "record",
      "fields": [
          {"name": "rowKey", "type": "string"},
          {"name": "fetchTime", "type": "long"},
          {"name": "title", "type": "string"},
          {"name": "text", "type": "string"},
          {"name": "status", "type": "int"}
      ]
     }
 ]
}

(ignore "protocol". I haven't yet figured out how to compile schemas without
protocols)

I have copied and modified avro's SpecificCompiler to generate a java class.
It is mostly the same class as avro's SpecificCompiler however the variables
are all private and are accessed through getters and setters. Here is a
portion of the file:

public class WebTableRow extends NutchTableRow&lt; Utf8&gt; implements
SpecificRecord {
  @RowKey // these are used for reflection
  private Utf8 rowKey;
  @RowField
  private long fetchTime;
  @RowField
  private Utf8 title;
  @RowField
  private Utf8 text;
  @RowField
  private int status;
  public Utf8 getRowKey() { .... }
  public void setRowKey(Utf8 value) {....}
  public long getFetchTime() { .... }
  public void setFetchTime(long value) { .... }
  .....

Note that NutchTableRow extends SpecificRecordBase so this is a proper avro
record. In the future, once hadoop MR supports avro as a serialization
format NutchTableRow-s can easily be output through maps and reduces which
is a nice bonus.

We need to force the usage of setters instead of direct access to variables.
Because one of the nice things about hbase is that you only update the
columns that you changed. However to know which fields are updated (and
thus, map them to hbase columns), we must keep track of what changed.
Currently, NutchTableRow keeps a BitSet for all fields and all setter
functions update this BitSet so we know exactly what changed.

There is also a new interface called NutchSerializer that defines readRow
and writeRow methods(it also needs scans, delete rows etc.. but that's for
later). Currently HbaseSerializer implements NutchSerializer and reads and
writes WebTableRow-s. HbaseSerializer currently works via reflection. It
should be easy to add code generation to our SpecificCompiler so that we can
also output a WebTableRowHbaseSerializer along with WebTableRow instead of
using reflection.

What I have currently can read and write primitive types + strings into and
from hbase. You can check it out from github.com/dogacan/nutchbase (branch
master, package o.a.n.storage). Again, I would like to note that the code is
very very alpha and is not in a good shape but it should be a good starting
point if you are interested.

Once hbase support is solid, I intend to add support for other databases
(bdb, cassandra and sql come to mind). If I got everything right, then
moving data from one database to another is an incredibly trivial task. So,
you can start with, say, bdb then switch over to hbase once your data gets
large.

Oh I forgot... HbaseSerializer reads a hbase-mapping.xml file that describes
the mapping between fields and hbase columns:

&lt;table name="webtable" class="org.apache.nutch.storage.WebTableRow"&gt;
  &lt;description&gt;
    &lt;family name="p"/&gt; &lt;!-- This can also have params like compression,
bloom filters --&gt;
    &lt;family name="f"/&gt;
  &lt;/description&gt;
  &lt;fields&gt;
    &lt;field name="fetchTime" family="f" qualifier="ts"/&gt;
    &lt;field name="title" family="p" qualifier="t"/&gt;
    &lt;field name="text" family="p" qualifier="c"/&gt;
    &lt;field name="status" family="f" qualifier="st"/&gt;
  &lt;/fields&gt;

Sorry for the long and rambling email. Feel free to ask if anything is
unclear (and I assume it must be, given my incoherent description :)
-- 
Doğacan Güney


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: State of nutchbase</title>
<author><name>Andrzej Bialecki &lt;ab@getopt.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c4B1C1A4E.6050309@getopt.org%3e"/>
<id>urn:uuid:%3c4B1C1A4E-6050309@getopt-org%3e</id>
<updated>2009-12-06T20:55:42Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Alban Mouton wrote:
&gt; Hello,
&gt; 
&gt; I have looked a little into nutch code and mailing lists. I think the 
&gt; nutchbase branch (http://issues.apache.org/jira/browse/NUTCH-650) is 
&gt; very interesting, with a good potential to improve code clarity and 
&gt; flexibility (I find data structure quite obscure in current version). 
&gt; The issue is untouched since last august, so my question is : can 
&gt; nutchbase really be part of nutch 1.1 ? 

Definitely no. Release 1.1 will be an update to 1.0, with no major 
design changes. However, we intend to integrate the nutchbase branch 
with trunk at some point - but since this would be a major change it 
would come under 2.0 branch or so ...


&gt; Is there still much work to do 
&gt; or is it almost ready ? Is it a worthy issue for an interested developer 
&gt; with a (still !) limited knowledge of the project ?

Please contact Dogacan, who is leading the work on this branch. AFAIK 
he's going to update the design soon.

&gt; 
&gt; So far I have only tried to run nutchbase in eclipse by applying the 
&gt; tutorial (http://wiki.apache.org/nutch/RunNutchInEclipse1.0) but I run 
&gt; in errors when building, mostly from Parser and tests. I may start by 
&gt; cleaning this up.

See above - please coordinate with Dogacan to avoid duplication of effort.

-- 
Best regards,
Andrzej Bialecki     &lt;&gt;&lt;
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;MilleBii (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c889751709.1260031881085.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c889751709-1260031881085-JavaMail-jira@brutus%3e</id>
<updated>2009-12-05T16:51:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12786443#action_12786443
] 

MilleBii commented on NUTCH-770:
--------------------------------

Tried it succesfully on a windows platform.

It does not work on a Ubuntu, pseudo-distributed hadoop configuration with mappers running
in parallel ????



&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Issue Comment Edited: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;MilleBii (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1188654834.1260031881184.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1188654834-1260031881184-JavaMail-jira@brutus%3e</id>
<updated>2009-12-05T16:51:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12786443#action_12786443
] 

MilleBii edited comment on NUTCH-770 at 12/5/09 4:50 PM:
---------------------------------------------------------

Tried it succesfully on a windows platform.

It does not work on a Ubuntu, pseudo-distributed hadoop configuration with two mappers running
in parallel ????



      was (Author: millebii):
    Tried it succesfully on a windows platform.

It does not work on a Ubuntu, pseudo-distributed hadoop configuration with mappers running
in parallel ????


  
&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>State of nutchbase</title>
<author><name>Alban Mouton &lt;alban83@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c42de0d300912050656w701fa1bdy3de543acea5c0934@mail.gmail.com%3e"/>
<id>urn:uuid:%3c42de0d300912050656w701fa1bdy3de543acea5c0934@mail-gmail-com%3e</id>
<updated>2009-12-05T14:56:49Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,

I have looked a little into nutch code and mailing lists. I think the
nutchbase branch (http://issues.apache.org/jira/browse/NUTCH-650) is very
interesting, with a good potential to improve code clarity and flexibility
(I find data structure quite obscure in current version). The issue is
untouched since last august, so my question is : can nutchbase really be
part of nutch 1.1 ? Is there still much work to do or is it almost ready ?
Is it a worthy issue for an interested developer with a (still !) limited
knowledge of the project ?

So far I have only tried to run nutchbase in eclipse by applying the
tutorial (http://wiki.apache.org/nutch/RunNutchInEclipse1.0) but I run in
errors when building, mostly from Parser and tests. I may start by cleaning
this up.

Eclipse build errors:

Description    Resource    Path    Location    Type
FetcherOutputFormat cannot be resolved to a type
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 362    Java Problem
Generator.GENERATE_MAX_PER_HOST_BY_IP cannot be resolved
TestGenerator.java    /nutchbase/src/test/org/apache/nutch/crawl    line
202    Java Problem
ParseImpl cannot be resolved to a type    ArcSegmentCreator.java
/nutchbase/src/java/org/apache/nutch/tools/arc    line 229    Java Problem
ParseImpl cannot be resolved to a type    BasicFields.java
/nutchbase/src/java/org/apache/nutch/indexer/field    line 335    Java
Problem
ParseImpl cannot be resolved to a type    ExtParser.java
/nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext    line
138    Java Problem
ParseImpl cannot be resolved to a type    MSBaseParser.java
/nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms    line
108    Java Problem
ParseImpl cannot be resolved to a type    OOParser.java
/nutchbase/src/plugin/parse-oo/src/java/org/apache/nutch/parse/oo    line
103    Java Problem
ParseImpl cannot be resolved to a type    PdfParser.java
/nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf    line
155    Java Problem
ParseImpl cannot be resolved to a type    RSSParser.java
/nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss    line
187    Java Problem
ParseImpl cannot be resolved to a type    SWFParser.java
/nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf    line
113    Java Problem
ParseImpl cannot be resolved to a type    TestIndexingFilters.java
/nutchbase/src/test/org/apache/nutch/indexer    line 45    Java Problem
ParseImpl cannot be resolved to a type    TestMoreIndexingFilter.java
/nutchbase/src/plugin/index-more/src/test/org/apache/nutch/indexer/more
line 61    Java Problem
ParseImpl cannot be resolved to a type    TextParser.java
/nutchbase/src/plugin/parse-text/src/java/org/apache/nutch/parse/text
line 55    Java Problem
ParseImpl cannot be resolved to a type    ZipParser.java
/nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip    line
105    Java Problem
ParseResult cannot be resolved    ExtParser.java
/nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext    line
137    Java Problem
ParseResult cannot be resolved    MSBaseParser.java
/nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms    line
107    Java Problem
ParseResult cannot be resolved    OOParser.java
/nutchbase/src/plugin/parse-oo/src/java/org/apache/nutch/parse/oo    line
103    Java Problem
ParseResult cannot be resolved    PdfParser.java
/nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf    line
155    Java Problem
ParseResult cannot be resolved    RSSParser.java
/nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss    line
187    Java Problem
ParseResult cannot be resolved    SWFParser.java
/nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf    line
113    Java Problem
ParseResult cannot be resolved    TextParser.java
/nutchbase/src/plugin/parse-text/src/java/org/apache/nutch/parse/text
line 55    Java Problem
ParseResult cannot be resolved    ZipParser.java
/nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip    line
105    Java Problem
ParseResult cannot be resolved to a type    ArcSegmentCreator.java
/nutchbase/src/java/org/apache/nutch/tools/arc    line 159    Java Problem
ParseResult cannot be resolved to a type    CCParseFilter.java
/nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch
line 267    Java Problem
ParseResult cannot be resolved to a type    CCParseFilter.java
/nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch
line 267    Java Problem
ParseResult cannot be resolved to a type    ExtParser.java
/nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext    line
69    Java Problem
ParseResult cannot be resolved to a type    FeedParser.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed    line
106    Java Problem
ParseResult cannot be resolved to a type    FeedParser.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed    line
108    Java Problem
ParseResult cannot be resolved to a type    FeedParser.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed    line
108    Java Problem
ParseResult cannot be resolved to a type    FeedParser.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed    line
211    Java Problem
ParseResult cannot be resolved to a type    FeedParser.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed    line
221    Java Problem
ParseResult cannot be resolved to a type    HTMLLanguageParser.java
/nutchbase/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang
line 90    Java Problem
ParseResult cannot be resolved to a type    HTMLLanguageParser.java
/nutchbase/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang
line 90    Java Problem
ParseResult cannot be resolved to a type    MSBaseParser.java
/nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms    line
64    Java Problem
ParseResult cannot be resolved to a type    MSExcelParser.java
/nutchbase/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel
line 40    Java Problem
ParseResult cannot be resolved to a type    MSPowerPointParser.java
/nutchbase/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint
line 44    Java Problem
ParseResult cannot be resolved to a type    MSWordParser.java
/nutchbase/src/plugin/parse-msword/src/java/org/apache/nutch/parse/msword
line 43    Java Problem
ParseResult cannot be resolved to a type    OOParser.java
/nutchbase/src/plugin/parse-oo/src/java/org/apache/nutch/parse/oo    line
63    Java Problem
ParseResult cannot be resolved to a type    PdfParser.java
/nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf    line
69    Java Problem
ParseResult cannot be resolved to a type    RSSParser.java
/nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss    line
80    Java Problem
ParseResult cannot be resolved to a type    RelTagParser.java
/nutchbase/src/plugin/microformats-reltag/src/java/org/apache/nutch/microformats/reltag
line 68    Java Problem
ParseResult cannot be resolved to a type    RelTagParser.java
/nutchbase/src/plugin/microformats-reltag/src/java/org/apache/nutch/microformats/reltag
line 68    Java Problem
ParseResult cannot be resolved to a type    SWFParser.java
/nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf    line
64    Java Problem
ParseResult cannot be resolved to a type    SWFParser.java
/nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf    line
125    Java Problem
ParseResult cannot be resolved to a type    TestFeedParser.java
/nutchbase/src/plugin/feed/src/test/org/apache/nutch/parse/feed    line
94    Java Problem
ParseResult cannot be resolved to a type    TextParser.java
/nutchbase/src/plugin/parse-text/src/java/org/apache/nutch/parse/text
line 41    Java Problem
ParseResult cannot be resolved to a type    ZipParser.java
/nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip    line
55    Java Problem
The constructor Fetcher(Configuration) is undefined    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 100    Java Problem
The constructor Fetcher(Configuration) is undefined    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 177    Java Problem
The constructor Generator(Configuration) is undefined    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 94    Java Problem
The constructor Generator(Configuration) is undefined
TestGenerator.java    /nutchbase/src/test/org/apache/nutch/crawl    line
312    Java Problem
The constructor Injector(Configuration) is undefined    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 90    Java Problem
The constructor Injector(Configuration) is undefined    TestInjector.java
/nutchbase/src/test/org/apache/nutch/crawl    line 70    Java Problem
The constructor NutchWritable(ParseImpl) is undefined
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 229    Java Problem
The import org.apache.nutch.fetcher.FetcherOutputFormat cannot be
resolved    ArcSegmentCreator.java
/nutchbase/src/java/org/apache/nutch/tools/arc    line 44    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 50    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
BasicFields.java    /nutchbase/src/java/org/apache/nutch/indexer/field
line 61    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
ExtParser.java
/nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext    line
26    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
MSBaseParser.java
/nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms    line
39    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
PdfParser.java
/nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf    line
41    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
RSSParser.java
/nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss    line
41    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
TestExtParser.java
/nutchbase/src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext    line
26    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
TestIndexingFilters.java    /nutchbase/src/test/org/apache/nutch/indexer
line 26    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
TestMSWordParser.java
/nutchbase/src/plugin/parse-msword/src/test/org/apache/nutch/parse/msword
line 26    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
TestMoreIndexingFilter.java
/nutchbase/src/plugin/index-more/src/test/org/apache/nutch/indexer/more
line 29    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
TestZipParser.java
/nutchbase/src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip    line
26    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
ZipParser.java
/nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip    line
33    Java Problem
The import org.apache.nutch.parse.ParseImpl cannot be resolved
ZipTextExtractor.java
/nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip    line
41    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 51    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
ExtParser.java
/nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext    line
21    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
FeedParser.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed    line
43    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
HTMLLanguageParser.java
/nutchbase/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang
line 33    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
MSBaseParser.java
/nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms    line
40    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
MSExcelParser.java
/nutchbase/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel
line 20    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
MSPowerPointParser.java
/nutchbase/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint
line 20    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
MSWordParser.java
/nutchbase/src/plugin/parse-msword/src/java/org/apache/nutch/parse/msword
line 21    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
PdfParser.java
/nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf    line
37    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
RSSParser.java
/nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss    line
36    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
RelTagParser.java
/nutchbase/src/plugin/microformats-reltag/src/java/org/apache/nutch/microformats/reltag
line 38    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
TestFeedParser.java
/nutchbase/src/plugin/feed/src/test/org/apache/nutch/parse/feed    line
32    Java Problem
The import org.apache.nutch.parse.ParseResult cannot be resolved
ZipParser.java
/nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip    line
34    Java Problem
The method calculate(WebTableRow, Parse) in the type Signature is not
applicable for the arguments (Content, Parse)    ArcSegmentCreator.java
/nutchbase/src/java/org/apache/nutch/tools/arc    line 187    Java Problem
The method calculate(WebTableRow, Parse) in the type Signature is not
applicable for the arguments (Content, Parse)    ArcSegmentCreator.java
/nutchbase/src/java/org/apache/nutch/tools/arc    line 208    Java Problem
The method fetch(String, int, boolean) from the type Fetcher is not
visible    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 178    Java Problem
The method fetch(String, int, boolean) in the type Fetcher is not applicable
for the arguments (Path, int, boolean)    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 101    Java Problem
The method generate(String, long, long, boolean) in the type Generator is
not applicable for the arguments (Path, Path, int, int, long, boolean,
boolean)    TestGenerator.java
/nutchbase/src/test/org/apache/nutch/crawl    line 313    Java Problem
The method generate(String, long, long, boolean) in the type Generator is
not applicable for the arguments (Path, Path, int, long, long, boolean,
boolean)    TestFetcher.java
/nutchbase/src/test/org/apache/nutch/fetcher    line 95    Java Problem
The method getData() is undefined for the type Parse
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 200    Java Problem
The method getData() is undefined for the type Parse
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 211    Java Problem
The method getData() is undefined for the type Parse
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 213    Java Problem
The method getData() is undefined for the type Parse
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 216    Java Problem
The method getData() is undefined for the type Parse
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 230    Java Problem
The method getData() is undefined for the type Parse
ArcSegmentCreator.java    /nutchbase/src/java/org/apache/nutch/tools/arc
line 244    Java Problem
The method getData() is undefined for the type Parse    BasicFields.java
/nutchbase/src/java/org/apache/nutch/indexer/field    line 386    Java
Problem
The method getData() is undefined for the type Parse    BasicFields.java
/nutchbase/src/java/org/apache/nutch/indexer/field    line 395    Java
Problem
The method getData() is undefined for the type Parse
CCIndexingFilter.java
/nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch
line 55    Java Problem
The method getData() is undefined for the type Parse
CCParseFilter.java
/nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch
line 280    Java Problem
The method getData() is undefined for the type Parse
CCParseFilter.java
/nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch
line 286    Java Problem
The method getData() is undefined for the type Parse
CCParseFilter.java
/nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch
line 291    Java Problem
The method getData() is undefined for the type Parse
FeedIndexingFilter.java
/nutchbase/src/plugin/feed/src/java/org/apache/nutch/indexer/feed    line
76    Java Problem


</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c975998736.1260023120945.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c975998736-1260023120945-JavaMail-jira@brutus%3e</id>
<updated>2009-12-05T14:25:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche updated NUTCH-767:
--------------------------------

    Attachment: NUTCH-767-part3.patch

the problems with the test comes from the fact that tika's detection of the mimetypes based
on content returns "text/plain"  when no mimetype can be identified, e.g. in our case because
we have an empty byte array as content.

Tika's MimeTypes used to have a default value which was used in MimeUtil to determine when
to use the type guessed by Tika but it has been removed since. The best course of action is
probably to take into account Tika's guess only if it is not  "text/plain" or "application/octet-stream",
which is what this patch implements.

The expected mime types in the test class are set to their original values (pre patch v2)
apart from the one which used Tika's default Mime Type.  

J.

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767-part2.patch, NUTCH-767-part3.patch, NUTCH-767.patch
&gt;
&gt;   Original Estimate: 0h
&gt;  Remaining Estimate: 0h
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Reopened: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1250537540.1260022760884.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1250537540-1260022760884-JavaMail-jira@brutus%3e</id>
<updated>2009-12-05T14:19:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche reopened NUTCH-767:
---------------------------------


the problem with the test class has been investigated. am reopening the issue so that we can
mark it as definitely fixed 

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767-part2.patch, NUTCH-767.patch
&gt;
&gt;   Original Estimate: 0h
&gt;  Remaining Estimate: 0h
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>Hudson build is back to normal: Nutch-trunk #1002</title>
<author><name>Apache Hudson Server &lt;hudson@hudson.zones.apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c8143476.7231259988863113.JavaMail.hudson@hudson.zones.apache.org%3e"/>
<id>urn:uuid:%3c8143476-7231259988863113-JavaMail-hudson@hudson-zones-apache-org%3e</id>
<updated>2009-12-05T04:54:23Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
See &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1002/changes&gt;




</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Hudson (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c308593077.1259988860721.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c308593077-1259988860721-JavaMail-jira@brutus%3e</id>
<updated>2009-12-05T04:54:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12786339#action_12786339
] 

Hudson commented on NUTCH-767:
------------------------------

Integrated in Nutch-trunk #1002 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1002/])
     Fix a failing test - still needs more work.


&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767-part2.patch, NUTCH-767.patch
&gt;
&gt;   Original Estimate: 0h
&gt;  Remaining Estimate: 0h
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Closed: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c2128930866.1259924420887.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c2128930866-1259924420887-JavaMail-jira@brutus%3e</id>
<updated>2009-12-04T11:00:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  closed NUTCH-767.
-----------------------------------

    Resolution: Fixed

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767-part2.patch, NUTCH-767.patch
&gt;
&gt;   Original Estimate: 0h
&gt;  Remaining Estimate: 0h
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c2141446225.1259924420867.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c2141446225-1259924420867-JavaMail-jira@brutus%3e</id>
<updated>2009-12-04T11:00:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  updated NUTCH-767:
------------------------------------

    Remaining Estimate: 0h
     Original Estimate: 0h

I applied the patch, and I'm closing this issue - we will track the test failures when we
upgrade to Tika 0.6, which is imminent.

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767-part2.patch, NUTCH-767.patch
&gt;
&gt;   Original Estimate: 0h
&gt;  Remaining Estimate: 0h
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>Build failed in Hudson: Nutch-trunk #1001</title>
<author><name>Apache Hudson Server &lt;hudson@hudson.zones.apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c16874727.6341259899424131.JavaMail.hudson@hudson.zones.apache.org%3e"/>
<id>urn:uuid:%3c16874727-6341259899424131-JavaMail-hudson@hudson-zones-apache-org%3e</id>
<updated>2009-12-04T04:03:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
See &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1001/&gt;

------------------------------------------
[...truncated 4696 lines...]
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-prefix/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-prefix/src/java/org/apache/nutch/urlfilter/prefix/PrefixURLFilter.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-prefix/urlfilter-prefix.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test/data&gt;
     [copy] Copying 6 files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test/data&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/classes&gt;

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

jar:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

compile-test:

compile:
     [echo] Compiling plugin: urlfilter-regex
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/urlfilter-regex.jar&gt;

deps-test:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlfilter-suffix
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlfilter-validator
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-basic
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-pass
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data&gt;
     [copy] Copying 4 files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-regex
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/urlnormalizer-regex.jar&gt;

deps-test:

init:

init-plugin:

compile:

jar:
      [jar] Warning: skipping jar archive &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-extensionpoints/nutch-extensionpoints.jar&gt;
because no files were included.

deps-test:

deploy:

copy-generated-lib:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;

compile:

job:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-2009-12-04_04-00-50.job&gt;

compile-core-test:
    [javac] Compiling 43 source files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/test/classes&gt;
    [javac] &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/test/org/apache/nutch/protocol/TestContent.java&gt;:102:
cannot find symbol
    [javac] symbol  : variable DEFAULT
    [javac] location: class org.apache.tika.mime.MimeTypes
    [javac]     assertEquals(MimeTypes.DEFAULT, c.getContentType());
    [javac]                           ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 1 error

BUILD FAILED
&lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build.xml&gt;:229: Compile
failed; see the compiler error output for details.

Total time: 1 minute 6 seconds
Archiving artifacts
Publishing Javadoc
Recording test results



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c257852668.1259836940752.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c257852668-1259836940752-JavaMail-jira@brutus%3e</id>
<updated>2009-12-03T10:42:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche updated NUTCH-767:
--------------------------------

    Attachment: NUTCH-767-part2.patch

Fixes compilation issues for test class src/test/org/apache/nutch/protocol/TestContent.java
and temporarily make sure that the expected mime-type values correspond to what is returned
by Tika. I will investigate why Tika does not return txt/html when the HTML document has no
text

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767-part2.patch, NUTCH-767.patch
&gt;
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>Build failed in Hudson: Nutch-trunk #1000</title>
<author><name>Apache Hudson Server &lt;hudson@hudson.zones.apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1036624.5441259815480661.JavaMail.hudson@hudson.zones.apache.org%3e"/>
<id>urn:uuid:%3c1036624-5441259815480661-JavaMail-hudson@hudson-zones-apache-org%3e</id>
<updated>2009-12-03T04:44:40Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
See &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1000/&gt;

------------------------------------------
[...truncated 4696 lines...]
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-prefix/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-prefix/src/java/org/apache/nutch/urlfilter/prefix/PrefixURLFilter.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-prefix/urlfilter-prefix.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test/data&gt;
     [copy] Copying 6 files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test/data&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/classes&gt;

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

jar:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

compile-test:

compile:
     [echo] Compiling plugin: urlfilter-regex
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/urlfilter-regex.jar&gt;

deps-test:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlfilter-suffix
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlfilter-validator
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-basic
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-pass
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data&gt;
     [copy] Copying 4 files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-regex
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/urlnormalizer-regex.jar&gt;

deps-test:

init:

init-plugin:

compile:

jar:
      [jar] Warning: skipping jar archive &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-extensionpoints/nutch-extensionpoints.jar&gt;
because no files were included.

deps-test:

deploy:

copy-generated-lib:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;

compile:

job:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-2009-12-03_04-42-24.job&gt;

compile-core-test:
    [javac] Compiling 43 source files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/test/classes&gt;
    [javac] &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/test/org/apache/nutch/protocol/TestContent.java&gt;:102:
cannot find symbol
    [javac] symbol  : variable DEFAULT
    [javac] location: class org.apache.tika.mime.MimeTypes
    [javac]     assertEquals(MimeTypes.DEFAULT, c.getContentType());
    [javac]                           ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 1 error

BUILD FAILED
&lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build.xml&gt;:229: Compile
failed; see the compiler error output for details.

Total time: 47 seconds
Archiving artifacts
Publishing Javadoc
Recording test results



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-774) Retry interval in crawl date is set to 0</title>
<author><name>&quot;Reinhard Schwab (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c991610051.1259755940891.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c991610051-1259755940891-JavaMail-jira@brutus%3e</id>
<updated>2009-12-02T12:12:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reinhard Schwab updated NUTCH-774:
----------------------------------

    Attachment: NUTCH-774.patch

fixes also a minor typo in AbstractFetchSchedule.java

&gt; Retry interval in crawl date is set to 0
&gt; ----------------------------------------
&gt;
&gt;                 Key: NUTCH-774
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-774
&gt;             Project: Nutch
&gt;          Issue Type: Bug
&gt;          Components: fetcher
&gt;    Affects Versions: 1.0.0
&gt;            Reporter: Reinhard Schwab
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-774.patch
&gt;
&gt;
&gt; When i fetch and parse a feed with the feed plugin,
&gt; http://www.wachauclimbing.net/home/impressum-disclaimer/feed/
&gt; another crawl date is generated
&gt; http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/
&gt; after fetching a second round
&gt; the dump in the crawl db still shows a retry interval with value 0.
&gt; http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/ Version: 7
&gt; Status: 2 (db_fetched)
&gt; Fetch time: Wed Dec 02 12:48:22 CET 2009
&gt; Modified time: Thu Jan 01 01:00:00 CET 1970
&gt; Retries since fetch: 0
&gt; Retry interval: 0 seconds (0 days)
&gt; Score: 1.0833334
&gt; Signature: db9ab2193924cd2d0b53113a500ca604
&gt; Metadata: _pst_: success(1), lastModified=0
&gt; a check should be done in DefaultFetchSchedule (or AbstractFetchSchedule) in the
&gt; method 
&gt; setFetchSchedule

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Reopened: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c844553263.1259755940701.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c844553263-1259755940701-JavaMail-jira@brutus%3e</id>
<updated>2009-12-02T12:12:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  reopened NUTCH-767:
-------------------------------------


&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767.patch
&gt;
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c503365350.1259755940872.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c503365350-1259755940872-JavaMail-jira@brutus%3e</id>
<updated>2009-12-02T12:12:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784790#action_12784790
] 

Andrzej Bialecki  commented on NUTCH-767:
-----------------------------------------

Reopening this issue, because TestContent is failing now - after fixing a trivial compilation
problem, now the problem seems to be that the type for empty content is auto-detected as "text/plain"
and this value overrides the hint from the Content-Type header.

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767.patch
&gt;
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-774) Retry interval in crawl date is set to 0</title>
<author><name>&quot;Reinhard Schwab (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1519920798.1259755940905.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1519920798-1259755940905-JavaMail-jira@brutus%3e</id>
<updated>2009-12-02T12:12:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reinhard Schwab updated NUTCH-774:
----------------------------------

    Patch Info: [Patch Available]

&gt; Retry interval in crawl date is set to 0
&gt; ----------------------------------------
&gt;
&gt;                 Key: NUTCH-774
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-774
&gt;             Project: Nutch
&gt;          Issue Type: Bug
&gt;          Components: fetcher
&gt;    Affects Versions: 1.0.0
&gt;            Reporter: Reinhard Schwab
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-774.patch
&gt;
&gt;
&gt; When i fetch and parse a feed with the feed plugin,
&gt; http://www.wachauclimbing.net/home/impressum-disclaimer/feed/
&gt; another crawl date is generated
&gt; http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/
&gt; after fetching a second round
&gt; the dump in the crawl db still shows a retry interval with value 0.
&gt; http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/ Version: 7
&gt; Status: 2 (db_fetched)
&gt; Fetch time: Wed Dec 02 12:48:22 CET 2009
&gt; Modified time: Thu Jan 01 01:00:00 CET 1970
&gt; Retries since fetch: 0
&gt; Retry interval: 0 seconds (0 days)
&gt; Score: 1.0833334
&gt; Signature: db9ab2193924cd2d0b53113a500ca604
&gt; Metadata: _pst_: success(1), lastModified=0
&gt; a check should be done in DefaultFetchSchedule (or AbstractFetchSchedule) in the
&gt; method 
&gt; setFetchSchedule

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Created: (NUTCH-774) Retry interval in crawl date is set to 0</title>
<author><name>&quot;Reinhard Schwab (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c2046675996.1259755580661.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c2046675996-1259755580661-JavaMail-jira@brutus%3e</id>
<updated>2009-12-02T12:06:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Retry interval in crawl date is set to 0
----------------------------------------

                 Key: NUTCH-774
                 URL: https://issues.apache.org/jira/browse/NUTCH-774
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.0.0
            Reporter: Reinhard Schwab
             Fix For: 1.1


When i fetch and parse a feed with the feed plugin,
http://www.wachauclimbing.net/home/impressum-disclaimer/feed/
another crawl date is generated
http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/

after fetching a second round
the dump in the crawl db still shows a retry interval with value 0.

http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/ Version: 7
Status: 2 (db_fetched)
Fetch time: Wed Dec 02 12:48:22 CET 2009
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 0 seconds (0 days)
Score: 1.0833334
Signature: db9ab2193924cd2d0b53113a500ca604
Metadata: _pst_: success(1), lastModified=0

a check should be done in DefaultFetchSchedule (or AbstractFetchSchedule) in the
method 
setFetchSchedule



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool</title>
<author><name>&quot;Raja Santosh Panda (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c526147615.1259738540721.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c526147615-1259738540721-JavaMail-jira@brutus%3e</id>
<updated>2009-12-02T07:22:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784680#action_12784680
] 

Raja Santosh Panda commented on NUTCH-666:
------------------------------------------

Hi,

I am looking forward to use only the language identifier (language-identifier.jar) plugin
for identification of chinese, japanese and korean languages.  

Can someone help me in this regard ?

Is this already implemented ? If yes, how can i take the dev version and use it ?

Can i use the language identifier of version 1.0 and train it (create N-Gram profiles) to
identify the above 3 languages ??

Any help is highly appreciated.

Regards
Raja

&gt; Analysis plugins for multiple language and new Language Identifier Tool
&gt; -----------------------------------------------------------------------
&gt;
&gt;                 Key: NUTCH-666
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-666
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;    Affects Versions: 1.1
&gt;         Environment: All
&gt;            Reporter: Dennis Kubes
&gt;            Assignee: Dennis Kubes
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-666-1-20081126.patch
&gt;
&gt;
&gt; Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, russian, and
thai.  Also includes a new Language Identifier tool that used the new indexing framework in
NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>Build failed in Hudson: Nutch-trunk #999</title>
<author><name>Apache Hudson Server &lt;hudson@hudson.zones.apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c11094238.4641259726656129.JavaMail.hudson@hudson.zones.apache.org%3e"/>
<id>urn:uuid:%3c11094238-4641259726656129-JavaMail-hudson@hudson-zones-apache-org%3e</id>
<updated>2009-12-02T04:04:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
See &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/999/&gt;

------------------------------------------
[...truncated 4696 lines...]
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-prefix/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-prefix/src/java/org/apache/nutch/urlfilter/prefix/PrefixURLFilter.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-prefix/urlfilter-prefix.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-prefix&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test/data&gt;
     [copy] Copying 6 files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test/data&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/classes&gt;

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

jar:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

compile-test:

compile:
     [echo] Compiling plugin: urlfilter-regex
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/urlfilter-regex.jar&gt;

deps-test:

init:

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: lib-regex-filter

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlfilter-suffix
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlfilter-validator
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-basic
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-pass
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes&gt;

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar&gt;

deps-test:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass&gt;
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data&gt;
     [copy] Copying 4 files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data&gt;

init:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes&gt;

init-plugin:

deps-jar:

compile:
     [echo] Compiling plugin: urlnormalizer-regex
    [javac] Compiling 1 source file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes&gt;
    [javac] Note: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java&gt;
uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/urlnormalizer-regex.jar&gt;

deps-test:

init:

init-plugin:

compile:

jar:
      [jar] Warning: skipping jar archive &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-extensionpoints/nutch-extensionpoints.jar&gt;
because no files were included.

deps-test:

deploy:

copy-generated-lib:

deploy:
    [mkdir] Created dir: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;

copy-generated-lib:
     [copy] Copying 1 file to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex&gt;

compile:

job:
      [jar] Building jar: &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-2009-12-02_04-00-50.job&gt;

compile-core-test:
    [javac] Compiling 43 source files to &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/test/classes&gt;
    [javac] &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/test/org/apache/nutch/protocol/TestContent.java&gt;:102:
cannot find symbol
    [javac] symbol  : variable DEFAULT
    [javac] location: class org.apache.tika.mime.MimeTypes
    [javac]     assertEquals(MimeTypes.DEFAULT, c.getContentType());
    [javac]                           ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 1 error

BUILD FAILED
&lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build.xml&gt;:229: Compile
failed; see the compiler error output for details.

Total time: 1 minute 5 seconds
Archiving artifacts
Publishing Javadoc
Recording test results



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c7794072.1259694201187.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c7794072-1259694201187-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T19:03:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784337#action_12784337
] 

Andrzej Bialecki  commented on NUTCH-767:
-----------------------------------------

Fixed in rev. 885869. Thank you!

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767.patch
&gt;
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Closed: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c110318532.1259694201146.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c110318532-1259694201146-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T19:03:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  closed NUTCH-767.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
         Assignee: Andrzej Bialecki   (was: Chris A. Mattmann)

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-767.patch
&gt;
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-767) Update Tika to v0.5  for the MimeType detection</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1147125684.1259684180654.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1147125684-1259684180654-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T16:16:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche updated NUTCH-767:
--------------------------------

    Description: 
The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is now
split in several jars, we need to place the tika-core.jar in the main nutch lib.


  was:
The version 5 of TIka requires a few changes to the MimeType implementation. Tika is now split
in several jars, we need to place the tika-core.jar in the main nutch lib.


        Summary: Update Tika to v0.5  for the MimeType detection  (was: Update Tika to v5.0
 for the MimeType detection)

&gt; Update Tika to v0.5  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Chris A. Mattmann
&gt;         Attachments: NUTCH-767.patch
&gt;
&gt;
&gt; The version 0.5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-767) Update Tika to v5.0  for the MimeType detection</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c512599829.1259683940773.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c512599829-1259683940773-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T16:12:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche updated NUTCH-767:
--------------------------------

    Description: 
The version 5 of TIka requires a few changes to the MimeType implementation. Tika is now split
in several jars, we need to place the tika-core.jar in the main nutch lib.


  was:
The latest version of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.


        Summary: Update Tika to v5.0  for the MimeType detection  (was: Update version of
Tika for the MimeType detection)

&gt; Update Tika to v5.0  for the MimeType detection
&gt; -----------------------------------------------
&gt;
&gt;                 Key: NUTCH-767
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-767
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Chris A. Mattmann
&gt;         Attachments: NUTCH-767.patch
&gt;
&gt;
&gt; The version 5 of TIka requires a few changes to the MimeType implementation. Tika is
now split in several jars, we need to place the tika-core.jar in the main nutch lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Closed: (NUTCH-769) Fetcher to skip queues for URLS getting repeated exceptions</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c973268738.1259680580728.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c973268738-1259680580728-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T15:16:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  closed NUTCH-769.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
         Assignee: Andrzej Bialecki 

&gt; Fetcher to skip queues for URLS getting repeated exceptions  
&gt; -------------------------------------------------------------
&gt;
&gt;                 Key: NUTCH-769
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-769
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;          Components: fetcher
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;            Priority: Minor
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-769-2.patch, NUTCH-769.patch
&gt;
&gt;
&gt; As discussed on the mailing list (see http://www.mail-archive.com/nutch-user@lucene.apache.org/msg15360.html)
this patch allows to clear URLs queues in the Fetcher when more than a set number of exceptions
have been encountered in a row. This can speed up the fetching substantially in cases where
target hosts are not responsive (as a TimeoutException would be thrown) and limits cases where
a whole Fetch step is slowed down because of a few queues.
&gt; by default the parameter fetcher.max.exceptions.per.queue has a value of -1 and is deactivated.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-769) Fetcher to skip queues for URLS getting repeated exceptions</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c2042358626.1259680580750.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c2042358626-1259680580750-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T15:16:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784260#action_12784260
] 

Andrzej Bialecki  commented on NUTCH-769:
-----------------------------------------

I had to apply this patch by hand, due to NUTCH-770. I also added conf/nutch-default.xml documentation.
This was committed in rev. 885785 - thanks!

&gt; Fetcher to skip queues for URLS getting repeated exceptions  
&gt; -------------------------------------------------------------
&gt;
&gt;                 Key: NUTCH-769
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-769
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;          Components: fetcher
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;            Priority: Minor
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-769-2.patch, NUTCH-769.patch
&gt;
&gt;
&gt; As discussed on the mailing list (see http://www.mail-archive.com/nutch-user@lucene.apache.org/msg15360.html)
this patch allows to clear URLs queues in the Fetcher when more than a set number of exceptions
have been encountered in a row. This can speed up the fetching substantially in cases where
target hosts are not responsive (as a TimeoutException would be thrown) and limits cases where
a whole Fetch step is slowed down because of a few queues.
&gt; by default the parameter fetcher.max.exceptions.per.queue has a value of -1 and is deactivated.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Closed: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20</title>
<author><name>&quot;Dennis Kubes (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1903168933.1259679560646.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1903168933-1259679560646-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T14:59:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dennis Kubes closed NUTCH-768.
------------------------------

    Resolution: Fixed

Weird.  The hsqldb License file was the same checksum as that pulled from hadoop.  It must
have had the windows EOL in hadoop distribution as well.  I changed it anyways.  Everything
committed with revision 885778.

&gt; Upgrade Nutch 1.0 to use Hadoop 0.20
&gt; ------------------------------------
&gt;
&gt;                 Key: NUTCH-768
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-768
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;    Affects Versions: 1.1
&gt;         Environment: All
&gt;            Reporter: Dennis Kubes
&gt;            Assignee: Dennis Kubes
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-768-1-20091125.patch
&gt;
&gt;
&gt; Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1924852546.1259679080831.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1924852546-1259679080831-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T14:51:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784250#action_12784250
] 

Andrzej Bialecki  commented on NUTCH-770:
-----------------------------------------

Fixed in rev. 885776. Thank you!

&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Closed: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1200831498.1259679080773.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1200831498-1259679080773-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T14:51:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  closed NUTCH-770.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
         Assignee: Andrzej Bialecki 

&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;            Assignee: Andrzej Bialecki 
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1158044850.1259671580712.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1158044850-1259671580712-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T12:46:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784206#action_12784206
] 

Andrzej Bialecki  commented on NUTCH-768:
-----------------------------------------

+1.

Minor nit: file lib/hsqldb-1.8.0.10.LICENSE.txt uses Windows EOL style, this should be probably
corrected before commit.

&gt; Upgrade Nutch 1.0 to use Hadoop 0.20
&gt; ------------------------------------
&gt;
&gt;                 Key: NUTCH-768
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-768
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;    Affects Versions: 1.1
&gt;         Environment: All
&gt;            Reporter: Dennis Kubes
&gt;            Assignee: Dennis Kubes
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-768-1-20091125.patch
&gt;
&gt;
&gt; Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20</title>
<author><name>&quot;Dennis Kubes (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c1195204345.1259645300664.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1195204345-1259645300664-JavaMail-jira@brutus%3e</id>
<updated>2009-12-01T05:28:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12784066#action_12784066
] 

Dennis Kubes commented on NUTCH-768:
------------------------------------

If no objections I will commit this tomorrow sometime?

&gt; Upgrade Nutch 1.0 to use Hadoop 0.20
&gt; ------------------------------------
&gt;
&gt;                 Key: NUTCH-768
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-768
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;    Affects Versions: 1.1
&gt;         Environment: All
&gt;            Reporter: Dennis Kubes
&gt;            Assignee: Dennis Kubes
&gt;             Fix For: 1.1
&gt;
&gt;         Attachments: NUTCH-768-1-20091125.patch
&gt;
&gt;
&gt; Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>Build failed in Hudson: Nutch-trunk #998</title>
<author><name>Apache Hudson Server &lt;hudson@hudson.zones.apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200912.mbox/%3c20601360.3981259640164494.JavaMail.hudson@hudson.zones.apache.org%3e"/>
<id>urn:uuid:%3c20601360-3981259640164494-JavaMail-hudson@hudson-zones-apache-org%3e</id>
<updated>2009-12-01T04:02:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
See &lt;http://hudson.zones.apache.org/hudson/job/Nutch-trunk/998/&gt;

------------------------------------------
A timer trigger started this job
Building remotely on lucene.zones.apache.org (Solaris 10)
Checking out http://svn.apache.org/repos/asf/lucene/nutch/trunk
ERROR: Failed to check out http://svn.apache.org/repos/asf/lucene/nutch/trunk
org.tmatesoft.svn.core.SVNException: svn: timed out waiting for server
svn: OPTIONS request failed on '/repos/asf/lucene/nutch/trunk'
	at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:103)
	at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:87)
	at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:616)
	at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:273)
	at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:261)
	at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:516)
	at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:98)
	at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1001)
	at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:178)
	at org.tmatesoft.svn.core.wc.SVNBasicClient.getRevisionNumber(SVNBasicClient.java:482)
	at org.tmatesoft.svn.core.wc.SVNBasicClient.getLocations(SVNBasicClient.java:851)
	at org.tmatesoft.svn.core.wc.SVNBasicClient.createRepository(SVNBasicClient.java:534)
	at org.tmatesoft.svn.core.wc.SVNUpdateClient.doCheckout(SVNUpdateClient.java:893)
	at org.tmatesoft.svn.core.wc.SVNUpdateClient.doCheckout(SVNUpdateClient.java:791)
	at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:617)
	at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:543)
	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2052)
	at hudson.remoting.UserRequest.perform(UserRequest.java:69)
	at hudson.remoting.UserRequest.perform(UserRequest.java:23)
	at hudson.remoting.Request$2.run(Request.java:200)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
	at java.net.Socket.connect(Socket.java:519)
	at org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57)
	... 1 more
Archiving artifacts
Publishing Javadoc
Recording test results



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: wrong wiki front page</title>
<author><name>Alban Mouton &lt;alban83@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200911.mbox/%3c42de0d300911300822g60d0ba6amb222b1f5e41b9a1a@mail.gmail.com%3e"/>
<id>urn:uuid:%3c42de0d300911300822g60d0ba6amb222b1f5e41b9a1a@mail-gmail-com%3e</id>
<updated>2009-11-30T16:22:00Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks. There is already a JIRA issue :
https://issues.apache.org/jira/browse/INFRA-2251 but it is pending since
last month !

2009/11/30 Andrzej Bialecki &lt;ab@getopt.org&gt;

&gt; Alban Mouton wrote:
&gt;
&gt;&gt; No reaction ? Isn't the Wiki admin on this mailing list ? I don't see any
&gt;&gt; link on the Wiki to contact the admin.
&gt;&gt;
&gt;&gt; The french frontpage is still the generic MoinMoin wiki home page and that
&gt;&gt; can make a bad impression to newcomers !
&gt;&gt;
&gt;
&gt; We have little control over the MoinMoin config (AFAIK it's configured for
&gt; multiple projects), and what you noticed is probably a fallout of the recent
&gt; wiki upgrade - please create a JIRA issue here:
&gt; https://issues.apache.org/jira/browse/INFRA (don't forget to mention the
&gt; project name).
&gt;
&gt;
&gt; --
&gt; Best regards,
&gt; Andrzej Bialecki     &lt;&gt;&lt;
&gt;  ___. ___ ___ ___ _ _   __________________________________
&gt; [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
&gt; ___|||__||  \|  ||  |  Embedded Unix, System Integration
&gt; http://www.sigram.com  Contact: info at sigram dot com
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: wrong wiki front page</title>
<author><name>Andrzej Bialecki &lt;ab@getopt.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200911.mbox/%3c4B13EB5C.2020903@getopt.org%3e"/>
<id>urn:uuid:%3c4B13EB5C-2020903@getopt-org%3e</id>
<updated>2009-11-30T15:57:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Alban Mouton wrote:
&gt; No reaction ? Isn't the Wiki admin on this mailing list ? I don't see 
&gt; any link on the Wiki to contact the admin.
&gt; 
&gt; The french frontpage is still the generic MoinMoin wiki home page and 
&gt; that can make a bad impression to newcomers !

We have little control over the MoinMoin config (AFAIK it's configured 
for multiple projects), and what you noticed is probably a fallout of 
the recent wiki upgrade - please create a JIRA issue here: 
https://issues.apache.org/jira/browse/INFRA (don't forget to mention the 
project name).


-- 
Best regards,
Andrzej Bialecki     &lt;&gt;&lt;
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: wrong wiki front page</title>
<author><name>Alban Mouton &lt;alban83@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200911.mbox/%3c42de0d300911300736t6642a202hd21f46a5b8a9bf40@mail.gmail.com%3e"/>
<id>urn:uuid:%3c42de0d300911300736t6642a202hd21f46a5b8a9bf40@mail-gmail-com%3e</id>
<updated>2009-11-30T15:36:42Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
No reaction ? Isn't the Wiki admin on this mailing list ? I don't see any
link on the Wiki to contact the admin.

The french frontpage is still the generic MoinMoin wiki home page and that
can make a bad impression to newcomers !

2009/11/26 Alban Mouton &lt;alban83@gmail.com&gt;

&gt; Issue and solutions described here :http://wiki.apache.org/httpd/HelpOnConfiguration#Default_front_page
&gt;
&gt; 2009/11/24 Alban Mouton &lt;alban83@gmail.com&gt;
&gt;
&gt;&gt; Hello everybody,
&gt;&gt;
&gt;&gt;
&gt;&gt; I don't know if it is a known issue, but it's been like that since at
&gt;&gt; least a couple of days so I figured I should tell someone. The root url for
&gt;&gt; the nutch wiki http://wiki.apache.org/nutch/ doesn't redirect to
&gt;&gt; http://wiki.apache.org/nutch/FrontPage ! It's annoying because that's the
&gt;&gt; url given by google and the nutch website. It might be a language detection
&gt;&gt; problem because I see this ugly and not very helpful page :
&gt;&gt; http://wiki.apache.org/nutch/PageD%27Accueil (page d'accueil = home page
&gt;&gt; in french).
&gt;&gt;
&gt;&gt; Not much of a contribution for my first message here, but I hope to do
&gt;&gt; more soon.
&gt;&gt;
&gt;&gt; Alban Mouton
&gt;&gt;
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200911.mbox/%3c785340982.1259591480677.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c785340982-1259591480677-JavaMail-jira@brutus%3e</id>
<updated>2009-11-30T14:31:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche updated NUTCH-770:
--------------------------------

    Attachment: NUTCH-770-v3.patch

the v2 applied the Lucene code formatting to the whole java file which caused far too many
changes, the v3 does the same as the v2 (add param and description to nutch default + change
timebomb to timelimit) but applies the code formatting only to the relevant portions of code

&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;         Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Updated: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;Julien Nioche (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200911.mbox/%3c1497006966.1259588360652.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1497006966-1259588360652-JavaMail-jira@brutus%3e</id>
<updated>2009-11-30T13:39:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

     [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julien Nioche updated NUTCH-770:
--------------------------------

    Attachment: NUTCH-770-v2.patch

* renamed timebomb into timelimit
* added parameter and its description in nutch-default.xml
* applied Lucene codestyle from http://wiki.apache.org/lucene-java/HowToContribute?action=AttachFile&amp;do=view&amp;target=Eclipse-Lucene-Codestyle.xml

&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;         Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
<entry>
<title>[jira] Commented: (NUTCH-770) Timebomb for Fetcher</title>
<author><name>&quot;Andrzej Bialecki  (JIRA)&quot; &lt;jira@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200911.mbox/%3c1356054090.1259585962524.JavaMail.jira@brutus%3e"/>
<id>urn:uuid:%3c1356054090-1259585962524-JavaMail-jira@brutus%3e</id>
<updated>2009-11-30T12:59:22Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

    [ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12783638#action_12783638
] 

Andrzej Bialecki  commented on NUTCH-770:
-----------------------------------------

bq.   "time limit" is definitely better than timebomb (but not as amusing). 

:) let's got for "informative" and "less confusing" now ... Could you please also add the
nutch-default.xml property and its documentation.

Re: FetchQueues - ok, you have a point here.

Re: code style - yes.

&gt; Timebomb for Fetcher
&gt; --------------------
&gt;
&gt;                 Key: NUTCH-770
&gt;                 URL: https://issues.apache.org/jira/browse/NUTCH-770
&gt;             Project: Nutch
&gt;          Issue Type: Improvement
&gt;            Reporter: Julien Nioche
&gt;         Attachments: log-770, NUTCH-770.patch
&gt;
&gt;
&gt; This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



</pre>
</div>
</content>
</entry>
</feed>
