lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <...@math.technion.ac.il>
Subject Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Fri, 07 Jul 2006 13:05:51 GMT
On Thu, Jul 06, 2006, Yonik Seeley wrote about "Re: [jira] Commented: (LUCENE-565) Supporting
deleteDocuments in IndexWriter (Code and Performance Results Provided)":
>..
> When one interleaves adds and deletes, it isn't the case that
> indexreaders and indexwriters need to be opened and closed each
> interleave.

Actually, you do have to do exactly that, because you can't leave both an
indexreader and indexwriter open, and delete documents in one and add
documents in another, interleaved: the lock that both indexreader (when
deleting) and indexwriter open will not allow that.

Granted, if you buffer either the deletes or additions in memory and do
them later in batches, you don't need to open indexreader and indexwriter
for every single document; But this is also something which is not trivial
to do (correctly and consistently) without writing a bunch of code.

> I was left wondering if the extensive changes to IndexWriter were
> worth it, or if it was best left to something at a higher level (like
> a better IndexModifier, or something like what Solr does).

My guess, based on my own experience as a Lucene newbie and on the large
number of questions I see on the lucene-user list, is that most users don't
understand why they need to concern themselves with the separate IndexReader
and IndexWriter objects. They'd rather have a single "Index" object on which
you can do any operation at any order, efficiently (not like IndexModifier).
Such an object should be part of the Lucene core, and not left for everyone
to implement themselves in a different way (like happens now).

If we could create a BetterIndexModifier which does this based on lower
level IndexWriter and IndexReader objects, that would be great, but it's
not obvious that it will be possible to do it efficiently enough, and
it's less clear what kind of guarantees such an implementation can make
(e.g., can it guarantee that a parallel IndexReader will not see 0 or 2
versions of the same document?).

That being said, I'd love to see a patch like Ning's, but which goes further
to combine the capabilties of an IndexReader and IndexWriter: After an
IndexWriter can delete a document based on a term it contains, why stop
there - why not allow this IndexWriter full reading capabilities, and allow
it to make more sophisticated searches to decide what to delete?
As I mentioned in a previous post, I needed this capability in an
application which indexed emails and attachments, and when an email document
was deleted I also had to delete the attached documents (listed in a field
of the email) from the index.

-- 
Nadav Har'El                        |       Friday, Jul 7 2006, 11 Tammuz 5766
IBM Haifa Research Lab              |-----------------------------------------
                                    |May you live as long as you want - and
http://nadav.harel.org.il           |never want as long as you live.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message