Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 13270 invoked from network); 29 Jul 2007 06:17:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Jul 2007 06:17:40 -0000 Received: (qmail 5742 invoked by uid 500); 29 Jul 2007 06:17:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 5712 invoked by uid 500); 29 Jul 2007 06:17:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 5700 invoked by uid 99); 29 Jul 2007 06:17:35 -0000 Received: from Unknown (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jul 2007 23:17:35 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dmitrytkach1@hotmail.com designates 65.54.246.177 as permitted sender) Received: from [65.54.246.177] (HELO bay0-omc2-s41.bay0.hotmail.com) (65.54.246.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Jul 2007 06:17:25 +0000 Received: from hotmail.com ([65.54.169.89]) by bay0-omc2-s41.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668); Sat, 28 Jul 2007 23:17:05 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Sat, 28 Jul 2007 23:17:05 -0700 Message-ID: Received: from 24.12.157.130 by BAY114-DAV17.phx.gbl with DAV; Sun, 29 Jul 2007 06:17:02 +0000 X-Originating-IP: [24.12.157.130] X-Originating-Email: [dmitrytkach1@hotmail.com] X-Sender: dmitrytkach1@hotmail.com From: "Dmitry" To: References: <20070720192740.9AC2D3E886E@f38.poczta.interia.pl> <359a92830707201310o1442bedembeb45051a4299eee@mail.gmail.com> <3F8F627A-CC84-4EC9-8159-336EF36335CE@ehatchersolutions.com> <359a92830707201445j6c269186j2b35330814e62b60@mail.gmail.com> Subject: Detection of index dublicates in Lucene Date: Sun, 29 Jul 2007 01:18:48 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 X-OriginalArrivalTime: 29 Jul 2007 06:17:05.0382 (UTC) FILETIME=[15CD1060:01C7D1A8] X-Virus-Checked: Checked by ClamAV on apache.org We trying to find are any implementation for Lucene - detection index duclicates. Assuming we have a set of documents and a document is a bunch of words. After we created indexec for the same document we need to knwo that all ideces will be uniq for specific document. (lexical equivalency). Can we have like implementation of algorithm has not determined a duplicate and another situation when algorithm has offered a false duplicate. In this case we can find all dublicate indeces. And the same Algorithm we can use to detect Document dublicates - in this case we save time and can get better performance not to run indexed services against this document. Please any suggestions will be good. Thanks, DT, www.ejinz.com Search Engine News --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org