Mailing list archives: March 2008

Site index · List index
Message list1 · 2 · Next »Thread · Author · Date
Siddharth Jha (JIRA) [jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed Mon, 03 Mar, 17:14
Siddharth Jha (JIRA) [jira] Created: (NUTCH-617) Cached Text Only Tue, 04 Mar, 08:45
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-617) Cached Text Only Tue, 04 Mar, 19:23
Frederic Wenzel Nightly builds unavailable Wed, 05 Mar, 10:11
Sami Siren Re: Nightly builds unavailable Wed, 05 Mar, 18:27
Andrzej Bialecki (JIRA) [jira] Created: (NUTCH-618) Tika error "Media type alias already exists" Thu, 06 Mar, 07:17
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-618) Tika error "Media type alias already exists" Fri, 07 Mar, 01:30
Chris A. Mattmann (JIRA) [jira] Assigned: (NUTCH-618) Tika error "Media type alias already exists" Fri, 07 Mar, 06:32
Chris A. Mattmann (JIRA) [jira] Commented: (NUTCH-618) Tika error "Media type alias already exists" Fri, 07 Mar, 06:34
Chris A. Mattmann (JIRA) [jira] Work started: (NUTCH-618) Tika error "Media type alias already exists" Fri, 07 Mar, 06:34
Euan Clark Confine nutch to one NIC? Sun, 09 Mar, 20:24
dong chen I have some problem with nutch result Tue, 11 Mar, 05:34
ogjunk-nu...@yahoo.com Re: Confine nutch to one NIC? Tue, 11 Mar, 20:21
Otis Gospodnetic (JIRA) [jira] Commented: (NUTCH-296) Image Search Wed, 12 Mar, 01:48
naveen.gosw...@wipro.com Problem in running Nutch where proxy authentication is required. Wed, 12 Mar, 16:09
naveen.gosw...@wipro.com Problem in running Nutch where proxy authentication is required. Wed, 12 Mar, 16:20
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful Fri, 14 Mar, 12:13
Andrzej Bialecki (JIRA) [jira] Updated: (NUTCH-616) Reset Fetch Retry counter when fetch is successful Fri, 14 Mar, 13:27
Andrzej Bialecki (JIRA) [jira] Assigned: (NUTCH-616) Reset Fetch Retry counter when fetch is successful Fri, 14 Mar, 13:29
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval Fri, 14 Mar, 14:02
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-613) Empty Summaries and Cached Pages Fri, 14 Mar, 14:24
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-613) Empty Summaries and Cached Pages Fri, 14 Mar, 14:24
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl Fri, 14 Mar, 14:38
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl Fri, 14 Mar, 14:38
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-610) Can't Update or modify an index while web gui is running Fri, 14 Mar, 14:44
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-601) Recrawling on existing crawl directory using force option Fri, 14 Mar, 14:54
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option Fri, 14 Mar, 14:54
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point Fri, 14 Mar, 15:00
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point Fri, 14 Mar, 15:00
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED Fri, 14 Mar, 15:00
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED Fri, 14 Mar, 15:00
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null Fri, 14 Mar, 15:10
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-575) NPE in OpenSearchServlet when summary is null Fri, 14 Mar, 15:10
Jesiel Trevisan Re: [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null Fri, 14 Mar, 16:16
Andrzej Bialecki Re: [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null Fri, 14 Mar, 17:24
Susam Pal Re: Problem in running Nutch where proxy authentication is required. Fri, 14 Mar, 17:41
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs Fri, 14 Mar, 23:34
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks Fri, 14 Mar, 23:38
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb Fri, 14 Mar, 23:42
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-70) duplicate pages - virtual hosts in db. Fri, 14 Mar, 23:58
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-70) duplicate pages - virtual hosts in db. Fri, 14 Mar, 23:58
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-126) Fetching via https does not work with a proxy (patch) Sat, 15 Mar, 00:18
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-126) Fetching via https does not work with a proxy (patch) Sat, 15 Mar, 00:18
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it Sat, 15 Mar, 00:20
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it Sat, 15 Mar, 00:20
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files Sat, 15 Mar, 00:22
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files Sat, 15 Mar, 00:22
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-189) Injection infinite loop Sat, 15 Mar, 00:24
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-189) Injection infinite loop Sat, 15 Mar, 00:24
Hudson (JIRA) [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null Sat, 15 Mar, 04:15
Hudson (JIRA) [jira] Commented: (NUTCH-126) Fetching via https does not work with a proxy (patch) Sat, 15 Mar, 04:15
Hudson (JIRA) [jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option Sat, 15 Mar, 04:15
Hudson (JIRA) [jira] Commented: (NUTCH-613) Empty Summaries and Cached Pages Sat, 15 Mar, 04:15
Hudson (JIRA) [jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl Sat, 15 Mar, 04:15
naveen.gosw...@wipro.com FW: Problem in running Nutch where proxy authentication is required. Sat, 15 Mar, 11:57
naveen.gosw...@wipro.com Thread behaviour in Nutch Crawl Sat, 15 Mar, 11:58
Vinci (JIRA) [jira] Created: (NUTCH-619) Another Language Identifier Plugin using Unicode code point range Sat, 15 Mar, 15:40
Vinci zh.ngp Sat, 15 Mar, 16:17
Vinci How can I change the analyzer of nutch query by plugin? Sat, 15 Mar, 16:26
Mark DeSpain (JIRA) [jira] Created: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Sun, 16 Mar, 07:22
Mark DeSpain (JIRA) [jira] Updated: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Sun, 16 Mar, 07:36
Vinci Chnage the Analyzer by plugin - how to dealing with the query? Sun, 16 Mar, 09:30
Vinci Write back to the segment? Sun, 16 Mar, 11:10
Vinci Re: Chnage the Analyzer by plugin - how to dealing with the query? Query always use the default analyzer! Sun, 16 Mar, 11:43
Vinci Cached page - can it be changed? Sun, 16 Mar, 12:12
Vinci (nutch 1.0) Query processing problem: NutchBeans and webapps search fail, but Luke sucess Sun, 16 Mar, 12:28
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Sun, 16 Mar, 20:06
Emmanuel Joke (JIRA) [jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval Mon, 17 Mar, 02:45
Emmanuel Joke (JIRA) [jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb Mon, 17 Mar, 02:59
Mark DeSpain (JIRA) [jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Mon, 17 Mar, 06:11
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval Mon, 17 Mar, 10:01
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval Mon, 17 Mar, 12:35
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-616) Reset Fetch Retry counter when fetch is successful Mon, 17 Mar, 12:43
Andrzej Bialecki Retire the original Fetcher before the release? Mon, 17 Mar, 13:05
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Mon, 17 Mar, 13:23
Dennis Kubes Re: Retire the original Fetcher before the release? Mon, 17 Mar, 14:01
Andrzej Bialecki Re: Retire the original Fetcher before the release? Mon, 17 Mar, 14:20
Dennis Kubes Re: Retire the original Fetcher before the release? Mon, 17 Mar, 14:36
Andrzej Bialecki Re: Retire the original Fetcher before the release? Mon, 17 Mar, 15:17
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException Mon, 17 Mar, 16:23
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException Mon, 17 Mar, 16:23
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN Mon, 17 Mar, 16:44
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN Mon, 17 Mar, 16:44
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression Mon, 17 Mar, 16:50
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression Mon, 17 Mar, 16:50
Andrzej Bialecki (JIRA) [jira] Closed: (NUTCH-610) Can't Update or modify an index while web gui is running Mon, 17 Mar, 16:52
Mark DeSpain (JIRA) [jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Tue, 18 Mar, 02:24
Mark DeSpain (JIRA) [jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash Tue, 18 Mar, 04:43
Siddhartha Reddy Current OPIC implementation Tue, 18 Mar, 05:16
Hudson (JIRA) [jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful Tue, 18 Mar, 05:33
Hudson (JIRA) [jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException Tue, 18 Mar, 05:33
Hudson (JIRA) [jira] Commented: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN Tue, 18 Mar, 05:33
Hudson (JIRA) [jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval Tue, 18 Mar, 05:33
Apache Hudson Server Build failed in Hudson: Nutch-trunk #393 Tue, 18 Mar, 05:34
Andrzej Bialecki Re: Current OPIC implementation Tue, 18 Mar, 09:18
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation Tue, 18 Mar, 10:05
Andrzej Bialecki (JIRA) [jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation Tue, 18 Mar, 10:05
Andrzej Bialecki (JIRA) [jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation Tue, 18 Mar, 10:05
Grant Ingersoll (JIRA) [jira] Created: (NUTCH-621) Nutch needs to declare it's crypto usage Tue, 18 Mar, 13:01
Andrzej Bialecki (JIRA) [jira] Commented: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s) Tue, 18 Mar, 14:51
Message list1 · 2 · Next »Thread · Author · Date
Box list
Nov 200920
Oct 200988
Sep 200932
Aug 200982
Jul 200977
Jun 200994
May 2009104
Apr 200985
Mar 2009255
Feb 2009250
Jan 2009197
Dec 2008130
Nov 2008117
Oct 200884
Sep 2008101
Aug 200858
Jul 200832
Jun 200893
May 200857
Apr 200878
Mar 2008152
Feb 2008189
Jan 2008151
Dec 200768
Nov 2007186
Oct 2007162
Sep 2007189
Aug 2007135
Jul 2007283
Jun 2007241
May 2007188
Apr 2007144
Mar 2007282
Feb 2007241
Jan 2007266
Dec 2006103
Nov 2006222
Oct 2006187
Sep 2006166
Aug 2006281
Jul 2006180
Jun 2006262
May 2006282
Apr 2006247
Mar 2006304
Feb 2006349
Jan 2006558
Dec 2005412
Nov 2005288
Oct 2005313
Sep 2005339
Aug 2005426
Jul 2005228
Jun 2005178
May 2005140
Apr 2005497
Mar 2005398
Feb 200510