lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexReader deleteDocument
Date Mon, 17 Mar 2008 10:28:14 GMT

I'm not sure what you mean by "same thread".  Maybe you meant "same  
index"?

Yes, if the IndexReader reopens.

IndexWriter.commit() makes the changes visible to readers, and makes  
the changes durable to os/computer crash or power outage.

Mike

Cam Bazz wrote:

> Another and last question;
>
> when the user commits, will an indexreader that is reading the same  
> thread
> see the changes made or not?
>
> I thought something was said about this, if my memory serves me  
> correct.
>
> Best.
>
> On Mon, Mar 17, 2008 at 11:53 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> It's a hard drive issue.  When you call fsync, the OS asks the hard
>> drive to sync.
>>
>> Mike
>>
>> Cam Bazz wrote:
>>
>>> Hello,
>>>
>>> I understand the issue. But I have not understood - is this
>>> hardware related
>>> issue - i.e a harddisk? or operating system?
>>>
>>> If I am using linux would the OS lie about fsyncing? could I do
>>> anything in
>>> the kernel to stop it from lying? or is this just a harddrive  
>>> related
>>> issue...
>>>
>>> Best.
>>>
>>> On Mon, Mar 17, 2008 at 11:12 AM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>>
>>>> When you write to a file, modern OSs by default just buffer those
>>>> writes in memory rather than actually writing them immediately to
>>>> disk.  Modern hard drives do the same (so, after the OS flushes to
>>>> the hard drive, the hard drive actually just buffers the writes,
>>>> too).  Then, when it's a good time, these buffered writes are  
>>>> spooled
>>>> to disk in the background.  They do this to get better  
>>>> performance on
>>>> write.
>>>>
>>>> Then, the fsync() call, which is an OS level call, requests that  
>>>> all
>>>> buffered bytes be flushed to the real underlying storage ("stable
>>>> storage").  It is not supposed to return until all written bytes  
>>>> are
>>>> on stable storage.  Lucene relies on this by fsync'ing all  
>>>> referenced
>>>> files in the index, before deleting the files referenced by  
>>>> previous
>>>> commits.  So, as of 2.4, this ensures the index will remain
>>>> consistent even if the OS or computer crashes, or power is cut.
>>>>
>>>> Unfortunately, there are apparently some devices which even when
>>>> fsync
>>>> () is called, return immediately even though the bytes are not
>>>> actually written to stable storage.  If you have such a device that
>>>> lies then Lucene 2.4 won't be able to guarantee index  
>>>> consistency on
>>>> crash/power outage.
>>>>
>>>> Mike
>>>>
>>>> Cam Bazz wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> What do you mean by IO system lying on fsync?
>>>>>
>>>>> Best.
>>>>>
>>>>> On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless <
>>>>> lucene@mikemccandless.com> wrote:
>>>>>
>>>>>>
>>>>>> Yes that's already been committed to trunk as well.
>>>>>>
>>>>>> IndexWriter now has a commit() method which syncs all referenced
>>>>>> files in the index to stable storage (assuming your IO system
>>>>>> doesn't
>>>>>> "lie" on fsync).
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote:
>>>>>>
>>>>>>> Nice. Thanks.
>>>>>>>
>>>>>>> will the 2.4 have commit improvements that we previously talked
>>>>>>> about?
>>>>>>>
>>>>>>> best regards.
>>>>>>>
>>>>>>> -C.B.
>>>>>>>
>>>>>>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless <
>>>>>>> lucene@mikemccandless.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> The trunk version of Lucene (eventually 2.4) now has  
>>>>>>>> deletion by
>>>>>>>> query, in IndexWriter.
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> Cam Bazz wrote:
>>>>>>>>
>>>>>>>>> Hello Erick,
>>>>>>>>>
>>>>>>>>> Has anyone found a way for deleting a document with a
query? I
>>>>>>>>> understand it
>>>>>>>>> can be deleted via terms, but I need to delete a document
>>>>>>>>> with two
>>>>>>>>> terms,
>>>>>>>>> that is the only way I can identify my document is by
 
>>>>>>>>> looking at
>>>>>>>>> two terms
>>>>>>>>> not one.
>>>>>>>>>
>>>>>>>>> best.
>>>>>>>>>
>>>>>>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick Erickson
>>>>>>>>> <erickerickson@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Doc IDs are assigned at index time and can change
over time
>>>>>>>>>> That
>>>>>>>>>> is,
>>>>>>>>>> deleting
>>>>>>>>>> a document and optimizing (and other operations)
can and will
>>>>>>>>>> change
>>>>>>>>>> document IDs. So, yes, you have to do a search (either
use a
>>>>>>>>>> hits
>>>>>>>>>> object
>>>>>>>>>> or one of the HitCollectors) in order to delete by
doc ID.
>>>>>>>>>>
>>>>>>>>>> You can also delete by terms, see the API.
>>>>>>>>>>
>>>>>>>>>> There are other options, but you haven't explianed
what  
>>>>>>>>>> you're
>>>>>>>>>> trying to accomplish enough to offer any more suggestions.
>>>>>>>>>>
>>>>>>>>>> Best
>>>>>>>>>> Erick
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 12, 2008 at 5:44 PM, varun sood  
>>>>>>>>>> <vsood2@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> No. I haven't but I will. even though I would
like to  
>>>>>>>>>>> make my
>>>>>>>>>>> own
>>>>>>>>>>> implementation. So any idea of how to get the
"doc num"?
>>>>>>>>>>>
>>>>>>>>>>> Thanks for replying.
>>>>>>>>>>> Varun
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:15 PM, Mark Miller
>>>>>>>>>>> <markrmiller@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Have you seen the work that Mark Harwood
has done making a
>>>>>>>>>>>> GWT
>>>>>>>>>>>> version
>>>>>>>>>>>> of Luke? I think its in the latest release.
>>>>>>>>>>>>
>>>>>>>>>>>> varun sood wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>   I am trying to delete a document without
using the hits
>>>>>>>>>>>>> object.
>>>>>>>>>>>>> What is the unique field in the index
that I can use to
>>>>>>>>>>>>> delete the
>>>>>>>>>>>> document?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to make a web interface where
index can be
>>>>>>>>>>>>> modified,
>>>>>>>>>>> smaller
>>>>>>>>>>>>> subset of what Luke does but using JSPs
and Servlet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> to use deleteDocument(int docNum)
>>>>>>>>>>>>> I need docNum how can I get this? or
does it have to come
>>>>>>>>>>>>> only vis
>>>>>>>>>>> Hits?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Varun
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----------------------------------------------------------

>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------

>>>>>>>> --
>>>>>>>> --
>>>>>>>> --
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------

>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message