Mailing list archives: April 2008

Site index · List index
Message list1 · 2 · 3 · Next »Thread · Author · Date
Doğacan Güney Re: protocol-http vs. -httpclient, HTTP 1.1 vs 1.0 Fri, 18 Apr, 18:49
Doğacan Güney Re: Files removed from https://svn.apache.org/repos/asf/lucene/nutch/trunk/bin??? Fri, 18 Apr, 20:21
Doğacan Güney Re: Normalizing host names (e.g. www1|www2 => www) Sun, 27 Apr, 09:41
Aldarris Nutch 0.9: CMD works, web gui does not Tue, 29 Apr, 15:23
Aldarris Re: Nutch 0.9: CMD works, web gui does not Tue, 29 Apr, 15:59
Andrew85 image download help Sat, 19 Apr, 17:45
Andrzej Bialecki Re: Handling slow/timeout servers Wed, 09 Apr, 10:56
Andrzej Bialecki Re: Weirdness: 2 Fetcher2 instances? Thu, 10 Apr, 08:32
Andrzej Bialecki Re: CrawlDatum: mislabeling? Thu, 10 Apr, 08:39
Andrzej Bialecki Re: Fetch task 100% done, but still fetching Thu, 10 Apr, 21:55
Andrzej Bialecki Re: Fetch task 100% done, but still fetching Fri, 11 Apr, 09:51
Andrzej Bialecki Re: Handling slow/timeout servers Fri, 11 Apr, 10:16
Andrzej Bialecki Re: Next Generation Nutch Mon, 14 Apr, 17:01
Andrzej Bialecki Re: Efficiently Finding the Segment of a Single URL Tue, 15 Apr, 06:29
Andrzej Bialecki Re: DomainStatistics Tue, 15 Apr, 15:59
Andrzej Bialecki Re: Parallel operations in fetch Wed, 16 Apr, 12:03
Andrzej Bialecki Re: Any HDFS protocol plugin like File protocol plugin ? Wed, 16 Apr, 12:31
Andrzej Bialecki Re: Parallel operations in fetch Thu, 17 Apr, 08:05
Andrzej Bialecki Re: Efficiently Finding the Segment of a Single URL Thu, 17 Apr, 08:07
Andrzej Bialecki Re: Parallel operations in fetch Thu, 17 Apr, 08:37
Andrzej Bialecki Re: protocol-http vs. -httpclient, HTTP 1.1 vs 1.0 Sat, 19 Apr, 21:46
Andrzej Bialecki Re: Parallel operations in fetch Sat, 19 Apr, 21:54
Andrzej Bialecki Re: Distributing code changes to nodes Sat, 19 Apr, 22:00
Andrzej Bialecki Re: Fetching inefficiency Wed, 23 Apr, 08:23
Arkadi.Kosmy...@csiro.au RE: Custom fields Mon, 31 Mar, 23:29
Arkadi.Kosmy...@csiro.au RE: Nutch fetching skipped files Wed, 02 Apr, 23:06
Bill Meltzer tika-mimetypes errors Tue, 29 Apr, 17:18
Bill Meltzer RE: tika-mimetypes errors Tue, 29 Apr, 17:28
Boris Lau was hadoop copy being slow? Fri, 04 Apr, 19:12
Bradford Stephens Difficulty w/ Distributed Crawl with Separate Nutch/Hadoop Thu, 03 Apr, 17:42
Bradford Stephens Re: Difficulty w/ Distributed Crawl with Separate Nutch/Hadoop Thu, 03 Apr, 18:40
Bradford Stephens Slow Crawl Speed and Tika Error Media type alias already exists: text/xml Sat, 05 Apr, 00:14
Bradford Stephens Re: Slow Crawl Speed and Tika Error Media type alias already exists: text/xml Mon, 07 Apr, 16:52
Bradford Stephens Re: Slow Crawl Speed and Tika Error Media type alias already exists: text/xml Wed, 09 Apr, 23:29
Bradford Stephens Nutch Remote Access API Wed, 09 Apr, 23:38
Bradford Stephens Efficiently Finding the Segment of a Single URL Mon, 14 Apr, 22:14
Bradford Stephens Re: Efficiently Finding the Segment of a Single URL Mon, 14 Apr, 23:49
Bradford Stephens Re: Efficiently Finding the Segment of a Single URL Tue, 15 Apr, 17:29
Bradford Stephens Re: Efficiently Finding the Segment of a Single URL Wed, 16 Apr, 18:21
Bradford Stephens Re: Efficiently Finding the Segment of a Single URL Wed, 16 Apr, 23:48
Bradford Stephens Re: Efficiently Finding the Segment of a Single URL Thu, 17 Apr, 17:44
Bradford Stephens Running other Hadoop Tasks on Nutch Servers? Thu, 24 Apr, 18:38
Bradford Stephens Cache URL Rewriting Not Working... Fri, 25 Apr, 19:10
Bradford Stephens Re: Cache URL Rewriting Not Working... Mon, 28 Apr, 17:29
Brent Walker Searching for Quoted Phrases Thu, 24 Apr, 14:25
Brian Ulicny Re: Search for Just PDF documents Wed, 16 Apr, 16:01
Brian Ulicny Extracting Embedded Outlinks Wed, 23 Apr, 15:45
Brian Ulicny RE: Extracting Embedded Outlinks Wed, 23 Apr, 17:41
Chris Fellows MultiSearcher: searching across multiple indices Mon, 21 Apr, 16:08
Chris Hane Re: Next Generation Nutch Fri, 18 Apr, 04:32
Chris Mattmann Re: Slow Crawl Speed and Tika Error Media type alias already exists: text/xml Sat, 05 Apr, 00:58
Chris Mattmann Re: Next Generation Nutch Sat, 12 Apr, 01:10
Chris Mattmann Re: Next Generation Nutch Sat, 12 Apr, 04:29
Dennis Kubes Re: Code to be modified Wed, 02 Apr, 14:34
Dennis Kubes Re: description of db.ignore.internal.links property Wed, 02 Apr, 14:40
Dennis Kubes Re: Fetch task 100% done, but still fetching Thu, 10 Apr, 21:41
Dennis Kubes Next Generation Nutch Fri, 11 Apr, 21:59
Dennis Kubes Re: Parallel operations in fetch Sun, 13 Apr, 15:11
Dennis Kubes Re: Next Generation Nutch Sun, 13 Apr, 15:29
Dennis Kubes Re: Next Generation Nutch Sun, 13 Apr, 15:35
Dennis Kubes Re: Next Generation Nutch Sun, 13 Apr, 15:44
Dennis Kubes Re: Next Generation Nutch Sun, 13 Apr, 15:48
Dennis Kubes Re: Merging Two Crawls Sun, 13 Apr, 15:50
Dennis Kubes Re: Next Generation Nutch Mon, 14 Apr, 15:37
Dennis Kubes Re: JobStream.py Tue, 15 Apr, 15:52
Dennis Kubes Re: Next Generation Nutch Tue, 15 Apr, 19:04
Dennis Kubes Re: Parallel operations in fetch Wed, 16 Apr, 04:56
Dennis Kubes Re: nutch data on *nix and windows Thu, 17 Apr, 05:42
Dennis Kubes Re: Next Generation Nutch Thu, 17 Apr, 19:33
Dennis Kubes Re: Fetching inefficiency Mon, 21 Apr, 23:43
Dennis Kubes Re: Fetching inefficiency Tue, 22 Apr, 13:58
Dennis Kubes Re: Generator: 0 records selected for fetching, exiting ... Tue, 22 Apr, 14:04
Dennis Kubes Re: Generator: 0 records selected for fetching, exiting ... Tue, 22 Apr, 17:22
Dennis Kubes Re: Generator: 0 records selected for fetching, exiting ... Wed, 23 Apr, 15:01
Devang - Google RE: score of freshly injected urls Wed, 30 Apr, 18:39
Euan Clark generate.maxurls.per.domain.default exceptions file? Mon, 21 Apr, 00:33
Euan Clark File format for generate.maxurls.per.domain.exceptions.file ? Tue, 22 Apr, 00:23
Euan Clark On-page javascript treated as relative link Sun, 27 Apr, 22:40
Evgeny Zhulenev Reduce tasks doesn't start Wed, 02 Apr, 17:57
Evgeny Zhulenev Re: Reduce tasks doesn't start Wed, 02 Apr, 18:09
Evgeny Zhulenev Re: Reduce tasks doesn't start Wed, 02 Apr, 22:52
Evgeny Zhulenev Nutch inject fails on reduce Thu, 03 Apr, 13:37
Evgeny Zhulenev Re: Reduce tasks doesn't start Thu, 03 Apr, 15:04
Evgeny Zhulenev Re: Reduce tasks doesn't start Thu, 03 Apr, 15:50
Evgeny Zhulenev Re: Reduce tasks doesn't start Thu, 03 Apr, 17:08
Evgeny Zhulenev Re: Reduce tasks doesn't start Thu, 03 Apr, 17:20
Evgeny Zhulenev Writing nutch plugin. Testing problem Thu, 17 Apr, 23:41
Garnier Garnier Crawling relative URLS with Nutch Tue, 01 Apr, 03:32
Gene Campbell Question about adding tags or attributes to indexed info Tue, 29 Apr, 12:33
Gene Campbell Fwd: Question about adding tags or attributes to indexed info Tue, 29 Apr, 20:20
Gene Campbell Please reply Tue, 29 Apr, 22:00
Gene Campbell Test Wed, 30 Apr, 03:06
Gene Campbell unit tests for indexing Wed, 30 Apr, 05:07
Gene Campbell Re: unit tests for indexing Wed, 30 Apr, 05:33
Gene Campbell Re: unit tests for indexing Wed, 30 Apr, 06:39
Gene Campbell Storing fields best practice question Wed, 30 Apr, 11:02
Gene Campbell Storing fields best practice question Wed, 30 Apr, 11:12
Gene Campbell Re: unit tests for indexing Wed, 30 Apr, 20:29
Hilkiah Lavinier nutch results: cache and search summary Thu, 10 Apr, 20:35
Hilkiah Lavinier index-more problem? Thu, 17 Apr, 22:59
Message list1 · 2 · 3 · Next »Thread · Author · Date
Box list
Nov 2009268
Oct 2009258
Sep 2009184
Aug 2009199
Jul 2009312
Jun 2009196
May 2009163
Apr 2009247
Mar 2009408
Feb 2009214
Jan 2009204
Dec 2008229
Nov 2008193
Oct 2008171
Sep 2008269
Aug 2008165
Jul 2008122
Jun 2008243
May 2008220
Apr 2008294
Mar 2008209
Feb 2008191
Jan 2008272
Dec 2007145
Nov 2007228
Oct 2007261
Sep 2007273
Aug 2007292
Jul 2007339
Jun 2007392
May 2007242
Apr 2007309
Mar 2007283
Feb 2007188
Jan 2007370
Dec 2006225
Nov 2006160
Oct 2006251
Sep 2006412
Aug 2006450
Jul 2006315
Jun 2006380
May 2006232
Apr 2006458
Mar 2006659
Feb 2006581
Jan 2006592
Dec 2005430
Nov 2005398
Oct 2005304
Sep 2005404
Aug 2005278
Jul 2005342
Jun 2005216
May 2005151
Apr 2005220
Mar 2005167