lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexReader deleteDocument
Date Mon, 17 Mar 2008 09:12:52 GMT

When you write to a file, modern OSs by default just buffer those  
writes in memory rather than actually writing them immediately to  
disk.  Modern hard drives do the same (so, after the OS flushes to  
the hard drive, the hard drive actually just buffers the writes,  
too).  Then, when it's a good time, these buffered writes are spooled  
to disk in the background.  They do this to get better performance on  
write.

Then, the fsync() call, which is an OS level call, requests that all  
buffered bytes be flushed to the real underlying storage ("stable  
storage").  It is not supposed to return until all written bytes are  
on stable storage.  Lucene relies on this by fsync'ing all referenced  
files in the index, before deleting the files referenced by previous  
commits.  So, as of 2.4, this ensures the index will remain  
consistent even if the OS or computer crashes, or power is cut.

Unfortunately, there are apparently some devices which even when fsync 
() is called, return immediately even though the bytes are not  
actually written to stable storage.  If you have such a device that  
lies then Lucene 2.4 won't be able to guarantee index consistency on  
crash/power outage.

Mike

Cam Bazz wrote:

> Hello,
>
> What do you mean by IO system lying on fsync?
>
> Best.
>
> On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> Yes that's already been committed to trunk as well.
>>
>> IndexWriter now has a commit() method which syncs all referenced
>> files in the index to stable storage (assuming your IO system doesn't
>> "lie" on fsync).
>>
>> Mike
>>
>> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote:
>>
>>> Nice. Thanks.
>>>
>>> will the 2.4 have commit improvements that we previously talked  
>>> about?
>>>
>>> best regards.
>>>
>>> -C.B.
>>>
>>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>>
>>>> The trunk version of Lucene (eventually 2.4) now has deletion by
>>>> query, in IndexWriter.
>>>>
>>>> Mike
>>>>
>>>> Cam Bazz wrote:
>>>>
>>>>> Hello Erick,
>>>>>
>>>>> Has anyone found a way for deleting a document with a query? I
>>>>> understand it
>>>>> can be deleted via terms, but I need to delete a document with two
>>>>> terms,
>>>>> that is the only way I can identify my document is by looking at
>>>>> two terms
>>>>> not one.
>>>>>
>>>>> best.
>>>>>
>>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick Erickson
>>>>> <erickerickson@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Doc IDs are assigned at index time and can change over time That
>>>>>> is,
>>>>>> deleting
>>>>>> a document and optimizing (and other operations) can and will
>>>>>> change
>>>>>> document IDs. So, yes, you have to do a search (either use a hits
>>>>>> object
>>>>>> or one of the HitCollectors) in order to delete by doc ID.
>>>>>>
>>>>>> You can also delete by terms, see the API.
>>>>>>
>>>>>> There are other options, but you haven't explianed what you're
>>>>>> trying to accomplish enough to offer any more suggestions.
>>>>>>
>>>>>> Best
>>>>>> Erick
>>>>>>
>>>>>> On Wed, Mar 12, 2008 at 5:44 PM, varun sood <vsood2@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> No. I haven't but I will. even though I would like to make my
 
>>>>>>> own
>>>>>>> implementation. So any idea of how to get the "doc num"?
>>>>>>>
>>>>>>> Thanks for replying.
>>>>>>> Varun
>>>>>>>
>>>>>>> On Wed, Mar 12, 2008 at 5:15 PM, Mark Miller
>>>>>>> <markrmiller@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Have you seen the work that Mark Harwood has done making
a GWT
>>>>>>>> version
>>>>>>>> of Luke? I think its in the latest release.
>>>>>>>>
>>>>>>>> varun sood wrote:
>>>>>>>>> Hi,
>>>>>>>>>   I am trying to delete a document without using the
hits
>>>>>>>>> object.
>>>>>>>>> What is the unique field in the index that I can use
to
>>>>>>>>> delete the
>>>>>>>> document?
>>>>>>>>>
>>>>>>>>> I am trying to make a web interface where index can be
 
>>>>>>>>> modified,
>>>>>>> smaller
>>>>>>>>> subset of what Luke does but using JSPs and Servlet.
>>>>>>>>>
>>>>>>>>> to use deleteDocument(int docNum)
>>>>>>>>> I need docNum how can I get this? or does it have to
come
>>>>>>>>> only vis
>>>>>>> Hits?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Varun
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------

>>>>>>>> --
>>>>>>>> --
>>>>>>>> --
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message