Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 54189 invoked from network); 11 Jan 2010 06:12:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jan 2010 06:12:59 -0000 Received: (qmail 23223 invoked by uid 500); 11 Jan 2010 06:12:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 23061 invoked by uid 500); 11 Jan 2010 06:12:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23048 invoked by uid 99); 11 Jan 2010 06:12:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 06:12:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simon.willnauer@googlemail.com designates 72.14.220.159 as permitted sender) Received: from [72.14.220.159] (HELO fg-out-1718.google.com) (72.14.220.159) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 06:12:47 +0000 Received: by fg-out-1718.google.com with SMTP id 19so3919076fgg.5 for ; Sun, 10 Jan 2010 22:12:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=J916Ad5GHgOBRtkVOloN5aKnEID1B1IthGL6n6GlOwA=; b=fPmMlzKJFngRr5uySRG4ZTw2UMAIdiCjimCaySmPRW+FsX9fOoexYAUIgYuVEmDuCD eZDl/CM2noZyd96VYnsy1RvwBYyEYYsvrbFyZndmTYwDwNijYXIXr4d9JEINI43BjLrI gL3+QCDfkvwsUFOph5X1pP4ylIJz99yd5/rbQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; b=jJ/6z17f3YQzWJc9eFdhCTAE6FSzvFrLaj2CYWpWXuP+wiOy8K82NRR3h2ZU8qApqW 3kPMBk6cOD9KsOu0tIJKZPQd+RAk0FT0Cj2uR+8Zlh5LPIzSLQ34c4faXF/rsTdltG/N d6K5l4dvYL+aU2io6tiEjGwo+sIZ63SOgCD5E= MIME-Version: 1.0 Received: by 10.239.144.100 with SMTP id n36mr613338hba.19.1263190345029; Sun, 10 Jan 2010 22:12:25 -0800 (PST) Reply-To: simon.willnauer@gmail.com In-Reply-To: <109100.17546.qm@web36107.mail.mud.yahoo.com> References: <109100.17546.qm@web36107.mail.mud.yahoo.com> Date: Mon, 11 Jan 2010 07:12:24 +0100 Message-ID: Subject: Re: a complete solution for building a website search with lucene From: Simon Willnauer To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001485f7249ccc99b9047cdd6b4d --001485f7249ccc99b9047cdd6b4d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable You should really look at Nutch. from the website http://lucene.apache.org/nutch: Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. sounds like a good place to start, doesn't it :) simon On Mon, Jan 11, 2010 at 2:47 AM, wrote: > Hi, > > Have you implemented such web search in your web application development? > As detailed as possible. example: > 1) index: ? > 2) search: Lucene > > Please do advise. > > Thanks. > > > --- On *Sat, 9/1/10, Simon Willnauer *wro= te: > > > From: Simon Willnauer > Subject: Re: a complete solution for building a website search with lucen= e > To: java-user@lucene.apache.org > Date: Saturday, 9 January, 2010, 6:16 PM > > I don't know that much about nutch but hadoop shouldn't really run > under windows in production. If you use windows for development this > should not be a big issue. > Oatis is right you should use cygwin together with hadoop. look at > http://wiki.apache.org/hadoop/FAQ for initial info. > > simon > > On Sat, Jan 9, 2010 at 5:20 AM, Otis Gospodnetic > > > wrote: > > Nutch is written in Java, so Nutch itself *should* work on other > non-Linux OSs that the JVM supports. > > But it does contain some shell scripts, as does Hadoop that Nutch uses. > Oh, I guess Windows people run it under Cygwin? > > Otis > > -- > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > > > > ----- Original Message ---- > >> From: "jyzhou817@yahoo.com " > > > >> To: java-user@lucene.apache.org > >> Sent: Fri, January 8, 2010 5:03:41 AM > >> Subject: Re: a complete solution for building a website search with > lucene > >> > >> Hi Paul, > >> > >> Thanks. > >> Use Nutch to do crawling. and integrate Lucene to the web application, > so that > >> can do search online. > >> > >> BTW, Nutch seems to have only Linux version, what my development is on > Windows. > >> Am i right? > >> > >> Zhou > >> > >> --- On Fri, 8/1/10, Paul Libbrecht wrote: > >> > >> From: Paul Libbrecht > >> Subject: Re: a complete solution for building a website search with > lucene > >> To: java-user@lucene.apache.org > >> Date: Friday, 8 January, 2010, 4:27 PM > >> > >> Zhou, > >> > >> Lucene is a back-end library, it's very useful for developer but it is > not a > >> complete site-search-engine. > >> A lucene-based site-search-engine is Nutch, it does crawl. > >> Solr also provides functions close to these with a large amount of > thoughts on > >> flexible integration; crawling methods are rather based on feeds or > other > >> acquisition methods (see DIH for example). > >> > >> paul > >> > >> > >> > >> > >> Le 08-janv.-10 =C3=A0 08:08, a =C3=A9crit : > >> > >> > Hi , > >> > > >> > I am new in Lucene. > >> > > >> > To build a web search function, it need to have a backendc indexing > function. > >> But, before that, should run a Crawler? because Lucene index based on > Html > >> documents, while Crawler can change the website pages to Html document= s. > Am i > >> right? > >> > > >> > If so, please anyone suggest to me a Crawler? like Nutch? > >> > Thanks > >> > Zhou > >> > > >> > > >> > > >> > > >> > New Email names for you! > >> > Get the Email name you've always wanted on the new @ymail and > @rocketmail. > >> > Hurry before someone else does! > >> > http://mail.promotions.yahoo.com/newdomains/sg/ > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > >> > >> > >> > >> New Email names for you! > >> Get the Email name you've always wanted on the new @ymail and > @rocketmail. > >> Hurry before someone else does! > >> http://mail.promotions.yahoo.com/newdomains/sg/ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > ------------------------------ > New Email names for you! > > Get the Email name you've always wanted on the new @ymail and @rocketmail= . > Hurry before someone else does! > --001485f7249ccc99b9047cdd6b4d--