Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 83766 invoked from network); 16 Dec 2008 19:33:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Dec 2008 19:33:33 -0000 Received: (qmail 19085 invoked by uid 500); 16 Dec 2008 19:33:45 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 18839 invoked by uid 500); 16 Dec 2008 19:33:44 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 18827 invoked by uid 99); 16 Dec 2008 19:33:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 11:33:44 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of skywind@gmail.com designates 209.85.200.171 as permitted sender) Received: from [209.85.200.171] (HELO wf-out-1314.google.com) (209.85.200.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 19:33:29 +0000 Received: by wf-out-1314.google.com with SMTP id 28so3004473wfc.20 for ; Tue, 16 Dec 2008 11:33:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=eTsl4rUuQ8KMFa5PBItAf+uIIFixxphTlzKoZQ8IpX0=; b=GVAmMQ4THkac/97btKQfqg6qHyv2pQehMwdGfX5pbP9uBunGRcMpU3BcAo0iBQYLbv wirJGBuQ2Ib02ngBuviSAe2lfz2YzyyzfQowm/3802uEiRomvyoWGG4Xjp7TPXd2qboJ bntSbav6j5uqNyFiUYmQ96RzLgnf1VctQrdZc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=ZIuzZsO3AlQOjw1a/raAI0Xj1F0gN+YPvVhu6rrdgh6ukgVndYELnl4K9r1n5UpGgD q7pHxbYAP42FKrceb3erdnBJeGJbDizlEdlp5ZKrDb/TC8FZ3HF2+Pwag98Dtux3UCEW P8Opmj2SWYRuGY/CCiF30260bNhjOJ+65pg04= Received: by 10.142.245.6 with SMTP id s6mr3412854wfh.302.1229455988615; Tue, 16 Dec 2008 11:33:08 -0800 (PST) Received: by 10.142.201.1 with HTTP; Tue, 16 Dec 2008 11:33:08 -0800 (PST) Message-ID: Date: Tue, 16 Dec 2008 13:33:08 -0600 From: "Jenny Brown" To: general@lucene.apache.org Subject: Re: Why lucene In-Reply-To: <21028869.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <21028869.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Dec 16, 2008 at 2:09 AM, ayyanar wrote: > Hi, Kindly share your thoughts on Why lucene and why not SQL? Possible scenario: You have 200,000 text documents to search. You need to find all documents that contain the words "baseball" and "pitchers". In SQL you would say where (text like '%baseball%' and text like '%pitchers%'), and the query could take a very long time, because that kind of search cannot use a sql index for performance. In Lucene, it would be able to very quickly find what documents mention those words, because it has an index based on the individual words found. In Lucene, you would also be able to say "baseball pitchers"~5 to find just those documents where the words are close together (only 5 words apart maximum). In SQL you cannot do a proximity search, even with a sql full text index. This becomes even more apparent the larger the document set gets. SQL can search a small number of documents fairly well, but with very many documents, it gets much slower. Lucene stays fast. SQL is fairly useful for short text fields with limited contents, that can be indexed. Lucene is good for bigger full texts and very many documents. Jenny Brown