lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexReader deleteDocument
Date Mon, 17 Mar 2008 12:08:41 GMT

I think that is quite a ways away.

This possibility was briefly mentioned on the java-dev list recently,  
to create an IndexReader that can access the in-memory buffered adds/ 
deletes in IndexWriter, but it would be a very large change for  
Lucene.  Various caches assume an index will not change, once opened.

That said, there is work being done to overhaul how FieldCache and  
norms work so as to greatly reduce the cost of 1) initially  
populating the FieldCache, and 2) updating only the portions of the  
FieldCache that were "dirtied" by a re-open.  So I think near term,  
making reopen faster is the priority and really is a necessary first  
step towards someday being able to have a "live" reader.

Mike

Cam Bazz wrote:

> Hello Mike,
>
> Is there any hope for making a lucene index that is fully  
> transparent, i.e.
> the indexreader seeing all the changes without reopening?
>
> Best.
>
> On Mon, Mar 17, 2008 at 12:35 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> Oh, sorry, no you still must reopen the IndexReader.  IndexReader
>> still searches only a point in time.
>>
>> Mike
>>
>> Cam Bazz wrote:
>>
>>> yes, I meant the same index.
>>>
>>> I thought with the new changes - the index reader would see the
>>> changes
>>> without re-opening.
>>> It would be real real cool to have that.
>>>
>>>
>>> Best.
>>>
>>> -C.B.
>>>
>>> On Mon, Mar 17, 2008 at 12:28 PM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>>
>>>> I'm not sure what you mean by "same thread".  Maybe you meant "same
>>>> index"?
>>>>
>>>> Yes, if the IndexReader reopens.
>>>>
>>>> IndexWriter.commit() makes the changes visible to readers, and  
>>>> makes
>>>> the changes durable to os/computer crash or power outage.
>>>>
>>>> Mike
>>>>
>>>> Cam Bazz wrote:
>>>>
>>>>> Another and last question;
>>>>>
>>>>> when the user commits, will an indexreader that is reading the  
>>>>> same
>>>>> thread
>>>>> see the changes made or not?
>>>>>
>>>>> I thought something was said about this, if my memory serves me
>>>>> correct.
>>>>>
>>>>> Best.
>>>>>
>>>>> On Mon, Mar 17, 2008 at 11:53 AM, Michael McCandless <
>>>>> lucene@mikemccandless.com> wrote:
>>>>>
>>>>>>
>>>>>> It's a hard drive issue.  When you call fsync, the OS asks the  
>>>>>> hard
>>>>>> drive to sync.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Cam Bazz wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I understand the issue. But I have not understood - is this
>>>>>>> hardware related
>>>>>>> issue - i.e a harddisk? or operating system?
>>>>>>>
>>>>>>> If I am using linux would the OS lie about fsyncing? could I
do
>>>>>>> anything in
>>>>>>> the kernel to stop it from lying? or is this just a harddrive
>>>>>>> related
>>>>>>> issue...
>>>>>>>
>>>>>>> Best.
>>>>>>>
>>>>>>> On Mon, Mar 17, 2008 at 11:12 AM, Michael McCandless <
>>>>>>> lucene@mikemccandless.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> When you write to a file, modern OSs by default just buffer
 
>>>>>>>> those
>>>>>>>> writes in memory rather than actually writing them  
>>>>>>>> immediately to
>>>>>>>> disk.  Modern hard drives do the same (so, after the OS
>>>>>>>> flushes to
>>>>>>>> the hard drive, the hard drive actually just buffers the
 
>>>>>>>> writes,
>>>>>>>> too).  Then, when it's a good time, these buffered writes
are
>>>>>>>> spooled
>>>>>>>> to disk in the background.  They do this to get better
>>>>>>>> performance on
>>>>>>>> write.
>>>>>>>>
>>>>>>>> Then, the fsync() call, which is an OS level call, requests
 
>>>>>>>> that
>>>>>>>> all
>>>>>>>> buffered bytes be flushed to the real underlying storage
 
>>>>>>>> ("stable
>>>>>>>> storage").  It is not supposed to return until all written
 
>>>>>>>> bytes
>>>>>>>> are
>>>>>>>> on stable storage.  Lucene relies on this by fsync'ing all
>>>>>>>> referenced
>>>>>>>> files in the index, before deleting the files referenced
by
>>>>>>>> previous
>>>>>>>> commits.  So, as of 2.4, this ensures the index will remain
>>>>>>>> consistent even if the OS or computer crashes, or power is
cut.
>>>>>>>>
>>>>>>>> Unfortunately, there are apparently some devices which even
 
>>>>>>>> when
>>>>>>>> fsync
>>>>>>>> () is called, return immediately even though the bytes are
not
>>>>>>>> actually written to stable storage.  If you have such a device
>>>>>>>> that
>>>>>>>> lies then Lucene 2.4 won't be able to guarantee index
>>>>>>>> consistency on
>>>>>>>> crash/power outage.
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> Cam Bazz wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> What do you mean by IO system lying on fsync?
>>>>>>>>>
>>>>>>>>> Best.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless
<
>>>>>>>>> lucene@mikemccandless.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes that's already been committed to trunk as well.
>>>>>>>>>>
>>>>>>>>>> IndexWriter now has a commit() method which syncs
all
>>>>>>>>>> referenced
>>>>>>>>>> files in the index to stable storage (assuming your
IO system
>>>>>>>>>> doesn't
>>>>>>>>>> "lie" on fsync).
>>>>>>>>>>
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote:
>>>>>>>>>>
>>>>>>>>>>> Nice. Thanks.
>>>>>>>>>>>
>>>>>>>>>>> will the 2.4 have commit improvements that we
previously
>>>>>>>>>>> talked
>>>>>>>>>>> about?
>>>>>>>>>>>
>>>>>>>>>>> best regards.
>>>>>>>>>>>
>>>>>>>>>>> -C.B.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless
<
>>>>>>>>>>> lucene@mikemccandless.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The trunk version of Lucene (eventually 2.4)
now has
>>>>>>>>>>>> deletion by
>>>>>>>>>>>> query, in IndexWriter.
>>>>>>>>>>>>
>>>>>>>>>>>> Mike
>>>>>>>>>>>>
>>>>>>>>>>>> Cam Bazz wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Erick,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Has anyone found a way for deleting a
document with a
>>>>>>>>>>>>> query? I
>>>>>>>>>>>>> understand it
>>>>>>>>>>>>> can be deleted via terms, but I need
to delete a document
>>>>>>>>>>>>> with two
>>>>>>>>>>>>> terms,
>>>>>>>>>>>>> that is the only way I can identify my
document is by
>>>>>>>>>>>>> looking at
>>>>>>>>>>>>> two terms
>>>>>>>>>>>>> not one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> best.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick
Erickson
>>>>>>>>>>>>> <erickerickson@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Doc IDs are assigned at index time
and can change over  
>>>>>>>>>>>>>> time
>>>>>>>>>>>>>> That
>>>>>>>>>>>>>> is,
>>>>>>>>>>>>>> deleting
>>>>>>>>>>>>>> a document and optimizing (and other
operations) can and
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>> document IDs. So, yes, you have to
do a search (either
>>>>>>>>>>>>>> use a
>>>>>>>>>>>>>> hits
>>>>>>>>>>>>>> object
>>>>>>>>>>>>>> or one of the HitCollectors) in order
to delete by doc  
>>>>>>>>>>>>>> ID.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can also delete by terms, see
the API.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are other options, but you
haven't explianed what
>>>>>>>>>>>>>> you're
>>>>>>>>>>>>>> trying to accomplish enough to offer
any more  
>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>> Erick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:44 PM,
varun sood
>>>>>>>>>>>>>> <vsood2@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No. I haven't but I will. even
though I would like to
>>>>>>>>>>>>>>> make my
>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>> implementation. So any idea of
how to get the "doc num"?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for replying.
>>>>>>>>>>>>>>> Varun
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:15
PM, Mark Miller
>>>>>>>>>>>>>>> <markrmiller@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you seen the work that
Mark Harwood has done
>>>>>>>>>>>>>>>> making a
>>>>>>>>>>>>>>>> GWT
>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>> of Luke? I think its in the
latest release.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> varun sood wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>   I am trying to delete
a document without using the
>>>>>>>>>>>>>>>>> hits
>>>>>>>>>>>>>>>>> object.
>>>>>>>>>>>>>>>>> What is the unique field
in the index that I can  
>>>>>>>>>>>>>>>>> use to
>>>>>>>>>>>>>>>>> delete the
>>>>>>>>>>>>>>>> document?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am trying to make a
web interface where index can be
>>>>>>>>>>>>>>>>> modified,
>>>>>>>>>>>>>>> smaller
>>>>>>>>>>>>>>>>> subset of what Luke does
but using JSPs and Servlet.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> to use deleteDocument(int
docNum)
>>>>>>>>>>>>>>>>> I need docNum how can
I get this? or does it have to
>>>>>>>>>>>>>>>>> come
>>>>>>>>>>>>>>>>> only vis
>>>>>>>>>>>>>>> Hits?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Varun
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -------------------------------------------------------

>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>>> For additional commands,
e-mail: java-user-
>>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----------------------------------------------------------

>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------

>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------

>>>>>>>> --
>>>>>>>> --
>>>>>>>> --
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------

>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message