From hadoop-user-return-736-apmail-lucene-hadoop-user-archive=lucene.apache.org@lucene.apache.org Mon Jan 15 13:10:00 2007 Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 66494 invoked from network); 15 Jan 2007 13:10:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jan 2007 13:10:00 -0000 Received: (qmail 46366 invoked by uid 500); 15 Jan 2007 13:10:05 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 46356 invoked by uid 500); 15 Jan 2007 13:10:05 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 46347 invoked by uid 99); 15 Jan 2007 13:10:05 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jan 2007 05:10:05 -0800 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=NO_REAL_NAME,RCVD_NUMERIC_HELO X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [62.213.197.12] (HELO mx20.webware.be) (62.213.197.12) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 15 Jan 2007 05:09:54 -0800 Received: (qmail 26921 invoked from network); 15 Jan 2007 13:09:34 -0000 Received: from unknown (HELO mx20.webware.be) (62.213.197.15) by web1.webware.be with SMTP; 15 Jan 2007 13:09:34 -0000 Received: from 169.5.97.26 (169.5.97.26 [169.5.97.26]) by webmail.webware.be (Horde MIME library) with HTTP; Mon, 15 Jan 2007 14:09:34 +0100 Message-ID: <20070115140934.6rgam6clyb40404k@webmail.webware.be> Date: Mon, 15 Jan 2007 14:09:34 +0100 From: maarten@sherpa-consulting.be To: hadoop-user@lucene.apache.org Subject: Re: Hadoop + Lucene integration: possible? how? References: <20070115134931.7vnny89brshwksso@webmail.webware.be> <45AB7967.1010807@getopt.org> In-Reply-To: <45AB7967.1010807@getopt.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.2) X-Virus-Checked: Checked by ClamAV on apache.org Thanks Andrzej, let's me quickly explain my situation: I'm developing an application which is partially based upon 'tags' =20 (the new hype lolz), instead of suing rdmbs for full text searching =20 the tag list / item I'll be using Lucene. The application will have =20 about 100000 visitors / day, mostly / only searching and not adding =20 stuff. I have, at the moment, no idea on how the performance will be =20 when all these users will be hitting Lucene. That was why I was =20 looking at a distributed solution and found Hadoop. So I'll be adding =20 and removing indexes, is removing possible on Hadoop, because you =20 mentioned read-only? You have any idea whether the scenario above can easily be handled by =20 Lucene (best guess) or that idd I'll be needing some kind of DFS? And =20 if so you have any suggestions? Thanks in advance! Grtz Quoting Andrzej Bialecki : > maarten@sherpa-consulting.be wrote: >> I'm new to lucene and Hadoop but what I can't seem to find in the =20 >> docs, internet... is how (and if possible?) to use Hadoop as the =20 >> underlying FS for Lucene? >> >> Could anyone explain me how these can be tied together? Some small =20 >> code/configuration example would be nice :-) > > It's possible to use Hadoop DFS to host a read-only Lucene index and > use it for searching (Nutch has an implementation of FSDirectory for > this purpose), but the performance is not stellar ... Currently it's > not (yet) possible to use HDFS for creating Lucene indexes, a minor > change to Lucene index format would be required. > > --=20 > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com