lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jyzhou...@yahoo.com>
Subject Re: a complete solution for building a website search with lucene
Date Fri, 08 Jan 2010 10:03:41 GMT
Hi Paul,

Thanks. 
Use Nutch to do crawling. and integrate Lucene to the web application, so that can do search
online.

BTW, Nutch seems to have only Linux version, what my development is on Windows. Am i right?

Zhou

--- On Fri, 8/1/10, Paul Libbrecht <paul@activemath.org> wrote:

From: Paul Libbrecht <paul@activemath.org>
Subject: Re: a complete solution for building a website search with lucene
To: java-user@lucene.apache.org
Date: Friday, 8 January, 2010, 4:27 PM

Zhou,

Lucene is a back-end library, it's very useful for developer but it is not a complete site-search-engine.
A lucene-based site-search-engine is Nutch, it does crawl.
Solr also provides functions close to these with a large amount of thoughts on flexible integration;
crawling methods are rather based on feeds or other acquisition methods (see DIH for example).

paul




Le 08-janv.-10 à 08:08, <jyzhou817@yahoo.com> a écrit :

> Hi ,
> 
> I am new in Lucene.
> 
> To build a web search function, it need to have a backendc indexing function. But, before
that, should run a Crawler? because Lucene index based on Html documents, while Crawler can
change the website pages to Html documents. Am i right?
> 
> If so, please anyone suggest to me a Crawler? like Nutch?
> Thanks
> Zhou
> 
> 
> 
> 
>      New Email names for you!
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




      New Email names for you! 
Get the Email name you&#39;ve always wanted on the new @ymail and @rocketmail. 
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message