lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: a complete solution for building a website search with lucene
Date Mon, 11 Jan 2010 06:12:24 GMT
You should really look at Nutch.
from the website http://lucene.apache.org/nutch: Nutch is open source
web-search software. It builds on Lucene Java<http://lucene.apache.org/java/>,
adding web-specifics, such as a crawler, a link-graph database, parsers for
HTML and other document formats, etc.

sounds like a good place to start, doesn't it :)

simon

On Mon, Jan 11, 2010 at 2:47 AM, <jyzhou817@yahoo.com> wrote:

> Hi,
>
> Have you implemented such web search in your web application development?
> As detailed as possible. example:
> 1) index: ?
> 2) search: Lucene
>
> Please do advise.
>
> Thanks.
>
>
> --- On *Sat, 9/1/10, Simon Willnauer <simon.willnauer@googlemail.com>*wrote:
>
>
> From: Simon Willnauer <simon.willnauer@googlemail.com>
> Subject: Re: a complete solution for building a website search with lucene
> To: java-user@lucene.apache.org
> Date: Saturday, 9 January, 2010, 6:16 PM
>
> I don't know that much about nutch but hadoop shouldn't really run
> under windows in production. If you use windows for development this
> should not be a big issue.
> Oatis is right you should use cygwin together with hadoop. look at
> http://wiki.apache.org/hadoop/FAQ for initial info.
>
> simon
>
> On Sat, Jan 9, 2010 at 5:20 AM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com<http://mc/compose?to=otis_gospodnetic@yahoo.com>>
> wrote:
> > Nutch is written in Java, so Nutch itself *should* work on other
> non-Linux OSs that the JVM supports.
> > But it does contain some shell scripts, as does Hadoop that Nutch uses.
>  Oh, I guess Windows people run it under Cygwin?
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: "jyzhou817@yahoo.com <http://mc/compose?to=jyzhou817@yahoo.com>"
> <jyzhou817@yahoo.com <http://mc/compose?to=jyzhou817@yahoo.com>>
> >> To: java-user@lucene.apache.org<http://mc/compose?to=java-user@lucene.apache.org>
> >> Sent: Fri, January 8, 2010 5:03:41 AM
> >> Subject: Re: a complete solution for building a website search with
> lucene
> >>
> >> Hi Paul,
> >>
> >> Thanks.
> >> Use Nutch to do crawling. and integrate Lucene to the web application,
> so that
> >> can do search online.
> >>
> >> BTW, Nutch seems to have only Linux version, what my development is on
> Windows.
> >> Am i right?
> >>
> >> Zhou
> >>
> >> --- On Fri, 8/1/10, Paul Libbrecht wrote:
> >>
> >> From: Paul Libbrecht
> >> Subject: Re: a complete solution for building a website search with
> lucene
> >> To: java-user@lucene.apache.org<http://mc/compose?to=java-user@lucene.apache.org>
> >> Date: Friday, 8 January, 2010, 4:27 PM
> >>
> >> Zhou,
> >>
> >> Lucene is a back-end library, it's very useful for developer but it is
> not a
> >> complete site-search-engine.
> >> A lucene-based site-search-engine is Nutch, it does crawl.
> >> Solr also provides functions close to these with a large amount of
> thoughts on
> >> flexible integration; crawling methods are rather based on feeds or
> other
> >> acquisition methods (see DIH for example).
> >>
> >> paul
> >>
> >>
> >>
> >>
> >> Le 08-janv.-10 à 08:08, a écrit :
> >>
> >> > Hi ,
> >> >
> >> > I am new in Lucene.
> >> >
> >> > To build a web search function, it need to have a backendc indexing
> function.
> >> But, before that, should run a Crawler? because Lucene index based on
> Html
> >> documents, while Crawler can change the website pages to Html documents.
> Am i
> >> right?
> >> >
> >> > If so, please anyone suggest to me a Crawler? like Nutch?
> >> > Thanks
> >> > Zhou
> >> >
> >> >
> >> >
> >> >
> >> >      New Email names for you!
> >> > Get the Email name you've always wanted on the new @ymail and
> @rocketmail.
> >> > Hurry before someone else does!
> >> > http://mail.promotions.yahoo.com/newdomains/sg/
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<http://mc/compose?to=java-user-unsubscribe@lucene.apache.org>
> >> For additional commands, e-mail: java-user-help@lucene.apache.org<http://mc/compose?to=java-user-help@lucene.apache.org>
> >>
> >>
> >>
> >>
> >>       New Email names for you!
> >> Get the Email name you've always wanted on the new @ymail and
> @rocketmail.
> >> Hurry before someone else does!
> >> http://mail.promotions.yahoo.com/newdomains/sg/
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<http://mc/compose?to=java-user-unsubscribe@lucene.apache.org>
> > For additional commands, e-mail: java-user-help@lucene.apache.org<http://mc/compose?to=java-user-help@lucene.apache.org>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<http://mc/compose?to=java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.org<http://mc/compose?to=java-user-help@lucene.apache.org>
>
>
> ------------------------------
>  New Email names for you!
> <http://sg.rd.yahoo.com/sg/mail/domainchoice/mail/signature/*http://mail.promotions.yahoo.com/newdomains/sg/>
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message