| Hudson (JIRA) |
[jira] Commented: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates |
Fri, 09 Nov, 05:38 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Fri, 09 Nov, 05:38 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Sat, 10 Nov, 04:42 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-538) Delete unused classes under o.a.n.util |
Sat, 10 Nov, 04:42 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. |
Tue, 13 Nov, 18:35 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. |
Thu, 15 Nov, 08:54 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Fri, 16 Nov, 20:26 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x |
Fri, 16 Nov, 20:26 |
| John Doe |
NullPointerException in FetchedSegments.getSummary() |
Thu, 08 Nov, 00:27 |
| John H. Lee (JIRA) |
[jira] Created: (NUTCH-575) NPE in OpenSearchServlet when summary is null |
Thu, 08 Nov, 22:43 |
| John H. Lee (JIRA) |
[jira] Updated: (NUTCH-575) NPE in OpenSearchServlet when summary is null |
Thu, 08 Nov, 22:45 |
| Joseph Chen (JIRA) |
[jira] Created: (NUTCH-571) parse-mp3 plugin doesn't always index album of mp3 |
Sat, 03 Nov, 02:25 |
| Joseph Chen (JIRA) |
[jira] Created: (NUTCH-579) Feed plugin only indexes one post per feed due to identical digest |
Wed, 21 Nov, 07:41 |
| Matt Kangas |
Re: [jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. |
Fri, 09 Nov, 20:45 |
| Matt Kangas |
Re: [jira] Created: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed |
Thu, 29 Nov, 21:07 |
| Nathaniel Powell (JIRA) |
[jira] Created: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:39 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:41 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:41 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:43 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:43 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:47 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:49 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:51 |
| Nathaniel Powell (JIRA) |
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 20 Nov, 21:51 |
| Ned Rockson |
When is the Clause.getQuery().getBoost == 0? |
Thu, 01 Nov, 21:32 |
| Ned Rockson |
Tika API |
Tue, 06 Nov, 22:47 |
| Ned Rockson |
Re: Tika API |
Wed, 07 Nov, 01:56 |
| Ned Rockson |
Re: Tika API |
Wed, 07 Nov, 19:13 |
| Ned Rockson |
Usage of mapred-default.xml is deprecated in hadoop0.15.0 |
Thu, 08 Nov, 22:20 |
| Ned Rockson |
EOF exception while fetching |
Fri, 09 Nov, 19:48 |
| Ned Rockson |
Nutch trunk js-parser problem with extremely long and meaningless Elements |
Fri, 16 Nov, 02:18 |
| Otis Gospodnetic (JIRA) |
[jira] Commented: (NUTCH-442) Integrate Solr/Nutch |
Sun, 18 Nov, 22:30 |
| Rajasekar Karthik (JIRA) |
[jira] Created: (NUTCH-573) Multiple Domains - Query Search |
Wed, 07 Nov, 18:59 |
| Rajasekar Karthik (JIRA) |
[jira] Created: (NUTCH-576) Different Analyzers Support |
Wed, 14 Nov, 15:10 |
| Renaud Richardet (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Thu, 15 Nov, 17:13 |
| Rohan Mehta (JIRA) |
[jira] Created: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly |
Wed, 21 Nov, 16:58 |
| Rohan Mehta (JIRA) |
[jira] Updated: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly |
Wed, 21 Nov, 17:00 |
| Ruslan Ermilov (JIRA) |
[jira] Created: (NUTCH-584) urls missing from fetchlist |
Wed, 28 Nov, 15:57 |
| Sagar Naik |
Re: takes the URI info, Content, headers, ect into a MYSQL database. |
Tue, 13 Nov, 05:51 |
| Sam Xia (JIRA) |
[jira] Issue Comment Edited: (NUTCH-356) Plugin repository cache can lead to memory leak |
Tue, 06 Nov, 18:55 |
| Sam Xia (JIRA) |
[jira] Issue Comment Edited: (NUTCH-356) Plugin repository cache can lead to memory leak |
Tue, 06 Nov, 18:57 |
| Sam Xia (JIRA) |
[jira] Issue Comment Edited: (NUTCH-356) Plugin repository cache can lead to memory leak |
Tue, 06 Nov, 18:57 |
| Sam Xia (JIRA) |
[jira] Issue Comment Edited: (NUTCH-356) Plugin repository cache can lead to memory leak |
Tue, 06 Nov, 18:57 |
| Sam Xia (JIRA) |
[jira] Issue Comment Edited: (NUTCH-356) Plugin repository cache can lead to memory leak |
Tue, 06 Nov, 18:59 |
| Sami Siren |
Backwards compatibility strategy |
Thu, 22 Nov, 17:45 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup. |
Fri, 09 Nov, 15:37 |
| Sami Siren (JIRA) |
[jira] Updated: (NUTCH-580) Remove deprecated hadoop api calls (FS) |
Wed, 21 Nov, 16:48 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-580) Remove deprecated hadoop api calls (FS) |
Wed, 21 Nov, 16:48 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-582) Add missing type parameters |
Wed, 21 Nov, 18:47 |
| Sami Siren (JIRA) |
[jira] Updated: (NUTCH-582) Add missing type parameters |
Wed, 21 Nov, 18:49 |
| Sebastian Steinmetz |
Re: adding dmoz meta data to index. |
Wed, 07 Nov, 14:10 |
| Sebastian Steinmetz (JIRA) |
[jira] Commented: (NUTCH-479) Support for OR queries |
Sat, 10 Nov, 19:58 |
| Susam Pal (JIRA) |
[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Thu, 01 Nov, 13:12 |
| Susam Pal (JIRA) |
[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Wed, 28 Nov, 18:29 |
| Tim Gautier |
Re: some question about development |
Wed, 28 Nov, 18:16 |
| Tomislav Poljak (JIRA) |
[jira] Commented: (NUTCH-442) Integrate Solr/Nutch |
Wed, 28 Nov, 10:14 |
| Xin Zhang |
How dose the Nutch-0.9 read the configuration file? |
Sun, 04 Nov, 11:30 |
| david euler (JIRA) |
[jira] Commented: (NUTCH-540) some problem about the Nutch cache |
Tue, 13 Nov, 14:12 |
| david euler (JIRA) |
[jira] Commented: (NUTCH-540) some problem about the Nutch cache |
Tue, 13 Nov, 14:19 |
| eyal edri |
Re: How dose the Nutch-0.9 read the configuration file? |
Sun, 04 Nov, 12:23 |
| eyal edri |
Need help in updating url in runtime in [Fetcher.java] |
Tue, 13 Nov, 15:30 |
| eyal edri |
Maintaining source url data (father) during runtime |
Sun, 25 Nov, 11:34 |
| eyal edri |
Re: Maintaining source url data (father) during runtime |
Mon, 26 Nov, 09:48 |
| eyal edri |
Re: Maintaining source url data (father) during runtime |
Tue, 27 Nov, 08:01 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #261 |
Fri, 09 Nov, 05:36 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #262 |
Sat, 10 Nov, 04:41 |
| jian chen |
Re: Maintaining source url data (father) during runtime |
Mon, 26 Nov, 18:12 |
| jqq |
Re: How to extract specified information from html? |
Sat, 03 Nov, 14:06 |
| karthik085 |
Re: plugin analyzer |
Fri, 02 Nov, 03:08 |
| karthik085 |
Nutch automatically deleting sites from search results |
Fri, 02 Nov, 03:27 |
| karthik085 |
MD5 vs TextProfile Signature |
Wed, 07 Nov, 00:27 |
| karthik085 |
db.ignore.internal.links and ranking algorithms |
Wed, 07 Nov, 20:32 |
| karthik085 |
Re: db.ignore.internal.links and ranking algorithms |
Wed, 07 Nov, 21:18 |
| karthik085 |
Re: db.ignore.internal.links and ranking algorithms |
Thu, 08 Nov, 04:08 |
| misc |
Can we add this to nutch? |
Fri, 09 Nov, 23:14 |
| misc |
Auto complete |
Sat, 10 Nov, 01:35 |
| misc |
Generator speed |
Sat, 10 Nov, 01:46 |
| misc |
wiki faq |
Sat, 10 Nov, 01:51 |
| n..@bcit |
adding dmoz meta data to index. |
Tue, 06 Nov, 19:29 |
| pavan kumar donepudi |
Parsing ppt with mimetype application/x-mspowerpoint |
Thu, 29 Nov, 15:38 |
| qi wu |
Re: How to extract specified information from html? |
Sat, 03 Nov, 13:56 |
| shaowen yu |
Applicant for Nutch Project |
Fri, 23 Nov, 06:13 |
| w00_008 |
Re: Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic |
Wed, 14 Nov, 18:14 |
| xingjian |
takes the URI info, Content, headers, ect into a MYSQL database. |
Tue, 13 Nov, 05:37 |
| xingjian |
Re: takes the URI info, Content, headers, ect into a MYSQL database. |
Tue, 13 Nov, 06:41 |
| xingjian |
about heritrix crawl,Who will tell me in this Nutch forum?thanks |
Fri, 16 Nov, 05:00 |