lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexReader deleteDocument
Date Mon, 17 Mar 2008 09:53:28 GMT

It's a hard drive issue.  When you call fsync, the OS asks the hard  
drive to sync.

Mike

Cam Bazz wrote:

> Hello,
>
> I understand the issue. But I have not understood - is this  
> hardware related
> issue - i.e a harddisk? or operating system?
>
> If I am using linux would the OS lie about fsyncing? could I do  
> anything in
> the kernel to stop it from lying? or is this just a harddrive related
> issue...
>
> Best.
>
> On Mon, Mar 17, 2008 at 11:12 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> When you write to a file, modern OSs by default just buffer those
>> writes in memory rather than actually writing them immediately to
>> disk.  Modern hard drives do the same (so, after the OS flushes to
>> the hard drive, the hard drive actually just buffers the writes,
>> too).  Then, when it's a good time, these buffered writes are spooled
>> to disk in the background.  They do this to get better performance on
>> write.
>>
>> Then, the fsync() call, which is an OS level call, requests that all
>> buffered bytes be flushed to the real underlying storage ("stable
>> storage").  It is not supposed to return until all written bytes are
>> on stable storage.  Lucene relies on this by fsync'ing all referenced
>> files in the index, before deleting the files referenced by previous
>> commits.  So, as of 2.4, this ensures the index will remain
>> consistent even if the OS or computer crashes, or power is cut.
>>
>> Unfortunately, there are apparently some devices which even when  
>> fsync
>> () is called, return immediately even though the bytes are not
>> actually written to stable storage.  If you have such a device that
>> lies then Lucene 2.4 won't be able to guarantee index consistency on
>> crash/power outage.
>>
>> Mike
>>
>> Cam Bazz wrote:
>>
>>> Hello,
>>>
>>> What do you mean by IO system lying on fsync?
>>>
>>> Best.
>>>
>>> On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>>
>>>> Yes that's already been committed to trunk as well.
>>>>
>>>> IndexWriter now has a commit() method which syncs all referenced
>>>> files in the index to stable storage (assuming your IO system  
>>>> doesn't
>>>> "lie" on fsync).
>>>>
>>>> Mike
>>>>
>>>> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote:
>>>>
>>>>> Nice. Thanks.
>>>>>
>>>>> will the 2.4 have commit improvements that we previously talked
>>>>> about?
>>>>>
>>>>> best regards.
>>>>>
>>>>> -C.B.
>>>>>
>>>>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless <
>>>>> lucene@mikemccandless.com> wrote:
>>>>>
>>>>>>
>>>>>> The trunk version of Lucene (eventually 2.4) now has deletion by
>>>>>> query, in IndexWriter.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Cam Bazz wrote:
>>>>>>
>>>>>>> Hello Erick,
>>>>>>>
>>>>>>> Has anyone found a way for deleting a document with a query?
I
>>>>>>> understand it
>>>>>>> can be deleted via terms, but I need to delete a document  
>>>>>>> with two
>>>>>>> terms,
>>>>>>> that is the only way I can identify my document is by looking
at
>>>>>>> two terms
>>>>>>> not one.
>>>>>>>
>>>>>>> best.
>>>>>>>
>>>>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick Erickson
>>>>>>> <erickerickson@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Doc IDs are assigned at index time and can change over time
 
>>>>>>>> That
>>>>>>>> is,
>>>>>>>> deleting
>>>>>>>> a document and optimizing (and other operations) can and
will
>>>>>>>> change
>>>>>>>> document IDs. So, yes, you have to do a search (either use
a  
>>>>>>>> hits
>>>>>>>> object
>>>>>>>> or one of the HitCollectors) in order to delete by doc ID.
>>>>>>>>
>>>>>>>> You can also delete by terms, see the API.
>>>>>>>>
>>>>>>>> There are other options, but you haven't explianed what you're
>>>>>>>> trying to accomplish enough to offer any more suggestions.
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Erick
>>>>>>>>
>>>>>>>> On Wed, Mar 12, 2008 at 5:44 PM, varun sood <vsood2@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> No. I haven't but I will. even though I would like to
make my
>>>>>>>>> own
>>>>>>>>> implementation. So any idea of how to get the "doc num"?
>>>>>>>>>
>>>>>>>>> Thanks for replying.
>>>>>>>>> Varun
>>>>>>>>>
>>>>>>>>> On Wed, Mar 12, 2008 at 5:15 PM, Mark Miller
>>>>>>>>> <markrmiller@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Have you seen the work that Mark Harwood has done
making a  
>>>>>>>>>> GWT
>>>>>>>>>> version
>>>>>>>>>> of Luke? I think its in the latest release.
>>>>>>>>>>
>>>>>>>>>> varun sood wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>   I am trying to delete a document without using
the hits
>>>>>>>>>>> object.
>>>>>>>>>>> What is the unique field in the index that I
can use to
>>>>>>>>>>> delete the
>>>>>>>>>> document?
>>>>>>>>>>>
>>>>>>>>>>> I am trying to make a web interface where index
can be
>>>>>>>>>>> modified,
>>>>>>>>> smaller
>>>>>>>>>>> subset of what Luke does but using JSPs and Servlet.
>>>>>>>>>>>
>>>>>>>>>>> to use deleteDocument(int docNum)
>>>>>>>>>>> I need docNum how can I get this? or does it
have to come
>>>>>>>>>>> only vis
>>>>>>>>> Hits?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Varun
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------

>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------

>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message