From java-user-return-36440-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Wed Oct 01 17:03:15 2008 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 92894 invoked from network); 1 Oct 2008 17:03:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Oct 2008 17:03:14 -0000 Received: (qmail 41972 invoked by uid 500); 1 Oct 2008 17:03:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41812 invoked by uid 500); 1 Oct 2008 17:03:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41801 invoked by uid 99); 1 Oct 2008 17:03:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2008 10:03:06 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,URIBL_GREY X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcelo.ochoa@gmail.com designates 209.85.217.13 as permitted sender) Received: from [209.85.217.13] (HELO mail-gx0-f13.google.com) (209.85.217.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2008 17:02:02 +0000 Received: by gxk6 with SMTP id 6so417150gxk.5 for ; Wed, 01 Oct 2008 10:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=CT6R9eyEiZTuZOy3IYK61wlQIcjKNn34C32XD1rWZ+M=; b=jSJm5qFrbh3Q/thC/rpaY2Sa5lgpeVTkHVJcbN4BOxB2nrNe/9pvyuIg3dHX6r7YEV GkllJcIiOkWxk8WCfhcHxeuSB0ZeCrwHVL3UO7frV0ac5CTf3c5Qbg3Un22VpQTLTwPN 4H5TzYmB5dvbGuqoFc3zmw2OmWXGonNcwzn3Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=NchN19tF5JDCFvgNwNNxalIpnk2btKiYRAjYUhhC14iEW+/PGYS0i/z46zuBm8pqGe TKjFWwg+ww4loImxEVMqSKIXUOD0pxyS7btN18kHoEgZ7RSzGKKjUrK5F34tjlZ71m21 K8ojbI8CWe7FkRblm118udPrMpSZiAXpXiVoo= Received: by 10.100.152.11 with SMTP id z11mr7839725and.112.1222880556000; Wed, 01 Oct 2008 10:02:36 -0700 (PDT) Received: by 10.100.8.18 with HTTP; Wed, 1 Oct 2008 10:02:35 -0700 (PDT) Message-ID: <126142c0810011002k5abf644dib0451de8afe0c339@mail.gmail.com> Date: Wed, 1 Oct 2008 14:02:35 -0300 From: "Marcelo Ochoa" To: java-user@lucene.apache.org Subject: Re: Lucene vs. Database In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <19755932.post@talk.nabble.com> <126142c0810010512w188f379cqbb3fa307bf6a2dbd@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org Mathieu: > Crawling a DB is not a good idea. Indexing while writing/deleting is > clever. These operations also consume network traffic in architectures like Solr WS. Also there is a waste of network traffic when a query is filtered against relational data (slides 15 and 18 of Google presentation), for example: select /*+ DOMAIN_INDEX_SORT */ p.id,p.first_Name,p.last_Name,p.nationality,p.sex,p.type_Document, p.number_Document,p.civil_State,p.date_Birth,p.mail,g.organization_id , lscore(1) as suma from person p left join (select * from guest where organization_id = 67) g on g.person_id = p.id where p.state = 1 and lcontains(p.first_name, 'rownum:[1 TO 20] AND John~ Doe~',1) > 0 This kind of filtering (security for example) is very common in relational world, then there are two possible solutions: 1) performs a free text search to lookup all the rowid that match and send it to the database to filter it against the other table 2) get all the rows from the DB and joins in middle tier with the rows which match the free text query. in both cases there will be a lot of network traffic if the free text query cardinally is larger than the relational filter. Many times the DB optimizer can choose a different execution plan based on how costly is the operation on the index and this information is known by the DB only. > Doing it inside the DB is a solution. > Java users like ORM. Compass plug Lucene indexation in the ORM's > transaction. If it's wrote or deleted, Lucene is aware. AFAIK now Lucene doesn't support two-phase commit, so what happen if transaction need to be undoed? If you perform an update on Lucene index before a relational delete, if the delete is rolled back the index will have inconsistently returning phantom reads. Otherwise if you perform the update on Lucene index after a DB operation which is committed and the index fail there will be rows which will be not considered as positive hit. In Lucene Domain Index both, the DML operation, and the Lucene storage are transactional and the operation can be secure rolled back. > Compass is opensource. Lucene and Lucene Domain Index too ;) so for Oracle users is an open source solution. Also we are looking for an alternative solution using open sources database, like H2, but not all databases have and API for creating new indexes. > > M. Best regards, Marcelo. -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home ______________ Want to integrate Lucene and Oracle? http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html Is Oracle 11g REST ready? http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org