lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maurice Yarrow <yar...@best.com>
Subject Finding docNum of a given indexed file
Date Thu, 06 Jul 2006 19:24:44 GMT

Hello Lucene community

So, having looked at the api and at numerous email postings and exchanges,
I see that updating a particular document in the index that represents a 
given file
that has changed involves

1) deleting with deleteDocument (of either IndexReader or IndexModifier)

and then

2) adding with addDocument (of either IndexReader or IndexModifier)

Question:
Is there any way to directly get the docNum of the document representing
the index file, given the file or file name ?

I see that unique terms are one way to identify this, but consider an index
for a tree of XML files, where two of them differ only by one word, and in
one of these, that word has changed.  However, that word alone may not
uniquely identify the XML file.

So:
Could the file name (fully qualified filepath/filename) be used as the 
search
term ?

Could the entire file be stringified (one long string, with or without 
new-lines)
and that be used as the term (probably not, since not tokenized) ?

Can the entire file be tokenized and uniqued, and this list of terms be 
used ?
(Once again, this might represent more than one file that just happen to 
contain
the same terms but ordered differently.)

Anyhow, this does seem like something that needs to be done frequently, but
is not directly supported.   Am I wrong ?   Please advise how this is 
best done.

Maurice Yarrow


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message