lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: IndexReader deleteDocument
Date Mon, 17 Mar 2008 09:16:17 GMT
Hello,

I understand the issue. But I have not understood - is this hardware related
issue - i.e a harddisk? or operating system?

If I am using linux would the OS lie about fsyncing? could I do anything in
the kernel to stop it from lying? or is this just a harddrive related
issue...

Best.

On Mon, Mar 17, 2008 at 11:12 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> When you write to a file, modern OSs by default just buffer those
> writes in memory rather than actually writing them immediately to
> disk.  Modern hard drives do the same (so, after the OS flushes to
> the hard drive, the hard drive actually just buffers the writes,
> too).  Then, when it's a good time, these buffered writes are spooled
> to disk in the background.  They do this to get better performance on
> write.
>
> Then, the fsync() call, which is an OS level call, requests that all
> buffered bytes be flushed to the real underlying storage ("stable
> storage").  It is not supposed to return until all written bytes are
> on stable storage.  Lucene relies on this by fsync'ing all referenced
> files in the index, before deleting the files referenced by previous
> commits.  So, as of 2.4, this ensures the index will remain
> consistent even if the OS or computer crashes, or power is cut.
>
> Unfortunately, there are apparently some devices which even when fsync
> () is called, return immediately even though the bytes are not
> actually written to stable storage.  If you have such a device that
> lies then Lucene 2.4 won't be able to guarantee index consistency on
> crash/power outage.
>
> Mike
>
> Cam Bazz wrote:
>
> > Hello,
> >
> > What do you mean by IO system lying on fsync?
> >
> > Best.
> >
> > On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >>
> >> Yes that's already been committed to trunk as well.
> >>
> >> IndexWriter now has a commit() method which syncs all referenced
> >> files in the index to stable storage (assuming your IO system doesn't
> >> "lie" on fsync).
> >>
> >> Mike
> >>
> >> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote:
> >>
> >>> Nice. Thanks.
> >>>
> >>> will the 2.4 have commit improvements that we previously talked
> >>> about?
> >>>
> >>> best regards.
> >>>
> >>> -C.B.
> >>>
> >>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless <
> >>> lucene@mikemccandless.com> wrote:
> >>>
> >>>>
> >>>> The trunk version of Lucene (eventually 2.4) now has deletion by
> >>>> query, in IndexWriter.
> >>>>
> >>>> Mike
> >>>>
> >>>> Cam Bazz wrote:
> >>>>
> >>>>> Hello Erick,
> >>>>>
> >>>>> Has anyone found a way for deleting a document with a query? I
> >>>>> understand it
> >>>>> can be deleted via terms, but I need to delete a document with two
> >>>>> terms,
> >>>>> that is the only way I can identify my document is by looking at
> >>>>> two terms
> >>>>> not one.
> >>>>>
> >>>>> best.
> >>>>>
> >>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick Erickson
> >>>>> <erickerickson@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Doc IDs are assigned at index time and can change over time
That
> >>>>>> is,
> >>>>>> deleting
> >>>>>> a document and optimizing (and other operations) can and will
> >>>>>> change
> >>>>>> document IDs. So, yes, you have to do a search (either use a
hits
> >>>>>> object
> >>>>>> or one of the HitCollectors) in order to delete by doc ID.
> >>>>>>
> >>>>>> You can also delete by terms, see the API.
> >>>>>>
> >>>>>> There are other options, but you haven't explianed what you're
> >>>>>> trying to accomplish enough to offer any more suggestions.
> >>>>>>
> >>>>>> Best
> >>>>>> Erick
> >>>>>>
> >>>>>> On Wed, Mar 12, 2008 at 5:44 PM, varun sood <vsood2@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> No. I haven't but I will. even though I would like to make
my
> >>>>>>> own
> >>>>>>> implementation. So any idea of how to get the "doc num"?
> >>>>>>>
> >>>>>>> Thanks for replying.
> >>>>>>> Varun
> >>>>>>>
> >>>>>>> On Wed, Mar 12, 2008 at 5:15 PM, Mark Miller
> >>>>>>> <markrmiller@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Have you seen the work that Mark Harwood has done making
a GWT
> >>>>>>>> version
> >>>>>>>> of Luke? I think its in the latest release.
> >>>>>>>>
> >>>>>>>> varun sood wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>   I am trying to delete a document without using
the hits
> >>>>>>>>> object.
> >>>>>>>>> What is the unique field in the index that I can
use to
> >>>>>>>>> delete the
> >>>>>>>> document?
> >>>>>>>>>
> >>>>>>>>> I am trying to make a web interface where index
can be
> >>>>>>>>> modified,
> >>>>>>> smaller
> >>>>>>>>> subset of what Luke does but using JSPs and Servlet.
> >>>>>>>>>
> >>>>>>>>> to use deleteDocument(int docNum)
> >>>>>>>>> I need docNum how can I get this? or does it have
to come
> >>>>>>>>> only vis
> >>>>>>> Hits?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Varun
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> ---------------------------------------------------------------
> >>>>>>>> --
> >>>>>>>> --
> >>>>>>>> --
> >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>>>> For additional commands, e-mail: java-user-
> >>>>>>>> help@lucene.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message