Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 14343 invoked from network); 6 Jul 2006 19:25:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Jul 2006 19:25:47 -0000 Received: (qmail 31458 invoked by uid 500); 6 Jul 2006 19:25:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 31423 invoked by uid 500); 6 Jul 2006 19:25:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 31411 invoked by uid 99); 6 Jul 2006 19:25:40 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jul 2006 12:25:40 -0700 X-ASF-Spam-Status: No, hits=2.1 required=10.0 tests=SPF_HELO_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: error (asf.osuosl.org: local policy) Received: from [63.240.77.81] (HELO sccrmhc11.comcast.net) (63.240.77.81) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jul 2006 12:25:35 -0700 Received: from best.com (c-24-6-61-94.hsd1.ca.comcast.net[24.6.61.94]) by comcast.net (sccrmhc11) with SMTP id <20060706192453011009crone>; Thu, 6 Jul 2006 19:24:53 +0000 Message-ID: <44AD637C.1000301@best.com> Date: Thu, 06 Jul 2006 12:24:44 -0700 From: Maurice Yarrow User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users Subject: Finding docNum of a given indexed file Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hello Lucene community So, having looked at the api and at numerous email postings and exchanges, I see that updating a particular document in the index that represents a given file that has changed involves 1) deleting with deleteDocument (of either IndexReader or IndexModifier) and then 2) adding with addDocument (of either IndexReader or IndexModifier) Question: Is there any way to directly get the docNum of the document representing the index file, given the file or file name ? I see that unique terms are one way to identify this, but consider an index for a tree of XML files, where two of them differ only by one word, and in one of these, that word has changed. However, that word alone may not uniquely identify the XML file. So: Could the file name (fully qualified filepath/filename) be used as the search term ? Could the entire file be stringified (one long string, with or without new-lines) and that be used as the term (probably not, since not tokenized) ? Can the entire file be tokenized and uniqued, and this list of terms be used ? (Once again, this might represent more than one file that just happen to contain the same terms but ordered differently.) Anyhow, this does seem like something that needs to be done frequently, but is not directly supported. Am I wrong ? Please advise how this is best done. Maurice Yarrow --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org