| Doğacan Güney |
Re: ant test failures |
Sat, 01 Sep, 12:48 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 07:47 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 07:49 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler |
Mon, 03 Sep, 07:53 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Mon, 03 Sep, 08:27 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Mon, 03 Sep, 13:38 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 18:14 |
| Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 18:20 |
| Emmanuel Joke (JIRA) |
[jira] Closed: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb |
Tue, 04 Sep, 03:36 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format |
Tue, 04 Sep, 07:16 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Tue, 04 Sep, 08:46 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 10:34 |
| Emmanuel Joke (JIRA) |
[jira] Created: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 10:34 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 10:38 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 11:42 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Tue, 04 Sep, 11:42 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Tue, 04 Sep, 12:32 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 15:30 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Tue, 04 Sep, 17:00 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Wed, 05 Sep, 15:07 |
| Marc Brette (JIRA) |
[jira] Commented: (NUTCH-251) Administration GUI |
Wed, 05 Sep, 16:33 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-251) Administration GUI |
Wed, 05 Sep, 18:09 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-546) file URL are filtered out by the crawler |
Thu, 06 Sep, 12:56 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb |
Thu, 06 Sep, 13:24 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-524) Generate Problem with Single Node |
Thu, 06 Sep, 13:26 |
| Jeff Maki |
Meta Tags and Indexing |
Thu, 06 Sep, 14:45 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Thu, 06 Sep, 16:32 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb |
Thu, 06 Sep, 17:38 |
| Jeff Maki |
Labeling URLs a-la Google |
Thu, 06 Sep, 20:04 |
| Marcin Okraszewski |
=?UTF-8?Q?Limiting_outlink_tags.?= |
Thu, 06 Sep, 21:09 |
| crossany (JIRA) |
[jira] Created: (NUTCH-549) Bug |
Fri, 07 Sep, 02:35 |
| Doğacan Güney |
Re: bug with generate performance |
Fri, 07 Sep, 07:37 |
| Doğacan Güney |
Re: Limiting outlink tags. |
Fri, 07 Sep, 07:55 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Fri, 07 Sep, 08:29 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Fri, 07 Sep, 08:29 |
| Andrzej Bialecki |
Re: bug with generate performance |
Fri, 07 Sep, 10:50 |
| ogjunk-nu...@yahoo.com |
Re: Labeling URLs a-la Google |
Fri, 07 Sep, 20:36 |
| Jim (JIRA) |
[jira] Created: (NUTCH-551) performance for generate is often really bad |
Fri, 07 Sep, 23:43 |
| misc |
Re: bug with generate performance |
Fri, 07 Sep, 23:47 |
| Jim (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Sat, 08 Sep, 02:14 |
| m.harig |
Pl...Give me example |
Sat, 08 Sep, 04:23 |
| r...@rosa.com |
Daniel Udatny is out of the office. |
Sat, 08 Sep, 08:09 |
| Susam Pal (JIRA) |
[jira] Updated: (NUTCH-44) too many search results |
Sat, 08 Sep, 09:55 |
| Susam Pal (JIRA) |
[jira] Updated: (NUTCH-44) too many search results |
Sat, 08 Sep, 11:08 |
| Susam Pal (JIRA) |
[jira] Updated: (NUTCH-44) too many search results |
Sat, 08 Sep, 11:25 |
| Susam Pal (JIRA) |
[jira] Updated: (NUTCH-281) cached.jsp: base-href needs to be outside comments |
Sun, 09 Sep, 10:57 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Mon, 10 Sep, 19:41 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-549) Bug |
Mon, 10 Sep, 19:41 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Mon, 10 Sep, 19:41 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-546) file URL are filtered out by the crawler |
Mon, 10 Sep, 19:47 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-491) dedup fails with ArrayIndexOutOfBoundsException |
Mon, 10 Sep, 19:49 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Mon, 10 Sep, 19:53 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Mon, 10 Sep, 20:00 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 10 Sep, 20:25 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 10 Sep, 20:44 |
| Jim (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Tue, 11 Sep, 01:59 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #203 |
Tue, 11 Sep, 06:37 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler |
Tue, 11 Sep, 06:39 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Tue, 11 Sep, 06:39 |
| Doğacan Güney |
Re: Build failed in Hudson: Nutch-Nightly #203 |
Tue, 11 Sep, 07:43 |
| eyal edri |
Downloading file types to file system |
Tue, 11 Sep, 08:41 |
| Susam Pal |
Re: Build failed in Hudson: Nutch-Nightly #203 |
Tue, 11 Sep, 09:30 |
| Doğacan Güney |
Re: Build failed in Hudson: Nutch-Nightly #203 |
Tue, 11 Sep, 10:43 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Tue, 11 Sep, 11:30 |
| Andrzej Bialecki |
GoogleMini URL rewriting |
Tue, 11 Sep, 20:01 |
| Jim (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Tue, 11 Sep, 22:08 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #204 |
Wed, 12 Sep, 04:22 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler |
Wed, 12 Sep, 04:22 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Wed, 12 Sep, 06:24 |
| Andrzej Bialecki |
Scoring API issues (LONG) |
Thu, 13 Sep, 15:44 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-552) Upgrade Nutch to Hadoop 0.14.x |
Thu, 13 Sep, 16:09 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-553) Add more normalization rules to regex-normalize file. |
Thu, 13 Sep, 16:41 |
| Jim (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Fri, 14 Sep, 20:16 |
| Susam Pal |
protocol-httpclient Authentication schemes |
Fri, 14 Sep, 21:40 |
| Brian Whitman (JIRA) |
[jira] Commented: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable |
Fri, 14 Sep, 22:47 |
| Brian Whitman (JIRA) |
[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog |
Fri, 14 Sep, 23:34 |
| Brian Whitman (JIRA) |
[jira] Created: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol |
Sat, 15 Sep, 15:16 |
| Karsten Dello (JIRA) |
[jira] Created: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:07 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:09 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:13 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:15 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:15 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:15 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:17 |
| Karsten Dello (JIRA) |
[jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils |
Sun, 16 Sep, 18:17 |
| King Kong (JIRA) |
[jira] Created: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks |
Mon, 17 Sep, 06:34 |
| King Kong (JIRA) |
[jira] Updated: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks |
Mon, 17 Sep, 06:57 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:13 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:14 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:15 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:15 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:16 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:17 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:17 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:18 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:19 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:20 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:20 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:21 |
| g.mar...@ifc.cnr.it |
{Dangerous Content?} Fwd: 100 Messaggi Inoltrati |
Mon, 17 Sep, 17:22 |