lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: IndexReader deleteDocument
Date Mon, 17 Mar 2008 11:48:21 GMT
Hello Mike,

Is there any hope for making a lucene index that is fully transparent, i.e.
the indexreader seeing all the changes without reopening?

Best.

On Mon, Mar 17, 2008 at 12:35 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> Oh, sorry, no you still must reopen the IndexReader.  IndexReader
> still searches only a point in time.
>
> Mike
>
> Cam Bazz wrote:
>
> > yes, I meant the same index.
> >
> > I thought with the new changes - the index reader would see the
> > changes
> > without re-opening.
> > It would be real real cool to have that.
> >
> >
> > Best.
> >
> > -C.B.
> >
> > On Mon, Mar 17, 2008 at 12:28 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >>
> >> I'm not sure what you mean by "same thread".  Maybe you meant "same
> >> index"?
> >>
> >> Yes, if the IndexReader reopens.
> >>
> >> IndexWriter.commit() makes the changes visible to readers, and makes
> >> the changes durable to os/computer crash or power outage.
> >>
> >> Mike
> >>
> >> Cam Bazz wrote:
> >>
> >>> Another and last question;
> >>>
> >>> when the user commits, will an indexreader that is reading the same
> >>> thread
> >>> see the changes made or not?
> >>>
> >>> I thought something was said about this, if my memory serves me
> >>> correct.
> >>>
> >>> Best.
> >>>
> >>> On Mon, Mar 17, 2008 at 11:53 AM, Michael McCandless <
> >>> lucene@mikemccandless.com> wrote:
> >>>
> >>>>
> >>>> It's a hard drive issue.  When you call fsync, the OS asks the hard
> >>>> drive to sync.
> >>>>
> >>>> Mike
> >>>>
> >>>> Cam Bazz wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I understand the issue. But I have not understood - is this
> >>>>> hardware related
> >>>>> issue - i.e a harddisk? or operating system?
> >>>>>
> >>>>> If I am using linux would the OS lie about fsyncing? could I do
> >>>>> anything in
> >>>>> the kernel to stop it from lying? or is this just a harddrive
> >>>>> related
> >>>>> issue...
> >>>>>
> >>>>> Best.
> >>>>>
> >>>>> On Mon, Mar 17, 2008 at 11:12 AM, Michael McCandless <
> >>>>> lucene@mikemccandless.com> wrote:
> >>>>>
> >>>>>>
> >>>>>> When you write to a file, modern OSs by default just buffer
those
> >>>>>> writes in memory rather than actually writing them immediately
to
> >>>>>> disk.  Modern hard drives do the same (so, after the OS
> >>>>>> flushes to
> >>>>>> the hard drive, the hard drive actually just buffers the writes,
> >>>>>> too).  Then, when it's a good time, these buffered writes are
> >>>>>> spooled
> >>>>>> to disk in the background.  They do this to get better
> >>>>>> performance on
> >>>>>> write.
> >>>>>>
> >>>>>> Then, the fsync() call, which is an OS level call, requests
that
> >>>>>> all
> >>>>>> buffered bytes be flushed to the real underlying storage ("stable
> >>>>>> storage").  It is not supposed to return until all written bytes
> >>>>>> are
> >>>>>> on stable storage.  Lucene relies on this by fsync'ing all
> >>>>>> referenced
> >>>>>> files in the index, before deleting the files referenced by
> >>>>>> previous
> >>>>>> commits.  So, as of 2.4, this ensures the index will remain
> >>>>>> consistent even if the OS or computer crashes, or power is cut.
> >>>>>>
> >>>>>> Unfortunately, there are apparently some devices which even
when
> >>>>>> fsync
> >>>>>> () is called, return immediately even though the bytes are not
> >>>>>> actually written to stable storage.  If you have such a device
> >>>>>> that
> >>>>>> lies then Lucene 2.4 won't be able to guarantee index
> >>>>>> consistency on
> >>>>>> crash/power outage.
> >>>>>>
> >>>>>> Mike
> >>>>>>
> >>>>>> Cam Bazz wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> What do you mean by IO system lying on fsync?
> >>>>>>>
> >>>>>>> Best.
> >>>>>>>
> >>>>>>> On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless <
> >>>>>>> lucene@mikemccandless.com> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Yes that's already been committed to trunk as well.
> >>>>>>>>
> >>>>>>>> IndexWriter now has a commit() method which syncs all
> >>>>>>>> referenced
> >>>>>>>> files in the index to stable storage (assuming your
IO system
> >>>>>>>> doesn't
> >>>>>>>> "lie" on fsync).
> >>>>>>>>
> >>>>>>>> Mike
> >>>>>>>>
> >>>>>>>> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote:
> >>>>>>>>
> >>>>>>>>> Nice. Thanks.
> >>>>>>>>>
> >>>>>>>>> will the 2.4 have commit improvements that we previously
> >>>>>>>>> talked
> >>>>>>>>> about?
> >>>>>>>>>
> >>>>>>>>> best regards.
> >>>>>>>>>
> >>>>>>>>> -C.B.
> >>>>>>>>>
> >>>>>>>>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless
<
> >>>>>>>>> lucene@mikemccandless.com> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The trunk version of Lucene (eventually 2.4)
now has
> >>>>>>>>>> deletion by
> >>>>>>>>>> query, in IndexWriter.
> >>>>>>>>>>
> >>>>>>>>>> Mike
> >>>>>>>>>>
> >>>>>>>>>> Cam Bazz wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hello Erick,
> >>>>>>>>>>>
> >>>>>>>>>>> Has anyone found a way for deleting a document
with a
> >>>>>>>>>>> query? I
> >>>>>>>>>>> understand it
> >>>>>>>>>>> can be deleted via terms, but I need to
delete a document
> >>>>>>>>>>> with two
> >>>>>>>>>>> terms,
> >>>>>>>>>>> that is the only way I can identify my document
is by
> >>>>>>>>>>> looking at
> >>>>>>>>>>> two terms
> >>>>>>>>>>> not one.
> >>>>>>>>>>>
> >>>>>>>>>>> best.
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick Erickson
> >>>>>>>>>>> <erickerickson@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Doc IDs are assigned at index time and
can change over time
> >>>>>>>>>>>> That
> >>>>>>>>>>>> is,
> >>>>>>>>>>>> deleting
> >>>>>>>>>>>> a document and optimizing (and other
operations) can and
> >>>>>>>>>>>> will
> >>>>>>>>>>>> change
> >>>>>>>>>>>> document IDs. So, yes, you have to do
a search (either
> >>>>>>>>>>>> use a
> >>>>>>>>>>>> hits
> >>>>>>>>>>>> object
> >>>>>>>>>>>> or one of the HitCollectors) in order
to delete by doc ID.
> >>>>>>>>>>>>
> >>>>>>>>>>>> You can also delete by terms, see the
API.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There are other options, but you haven't
explianed what
> >>>>>>>>>>>> you're
> >>>>>>>>>>>> trying to accomplish enough to offer
any more suggestions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best
> >>>>>>>>>>>> Erick
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:44 PM, varun
sood
> >>>>>>>>>>>> <vsood2@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> No. I haven't but I will. even though
I would like to
> >>>>>>>>>>>>> make my
> >>>>>>>>>>>>> own
> >>>>>>>>>>>>> implementation. So any idea of how
to get the "doc num"?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for replying.
> >>>>>>>>>>>>> Varun
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:15 PM,
Mark Miller
> >>>>>>>>>>>>> <markrmiller@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Have you seen the work that
Mark Harwood has done
> >>>>>>>>>>>>>> making a
> >>>>>>>>>>>>>> GWT
> >>>>>>>>>>>>>> version
> >>>>>>>>>>>>>> of Luke? I think its in the
latest release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> varun sood wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>   I am trying to delete
a document without using the
> >>>>>>>>>>>>>>> hits
> >>>>>>>>>>>>>>> object.
> >>>>>>>>>>>>>>> What is the unique field
in the index that I can use to
> >>>>>>>>>>>>>>> delete the
> >>>>>>>>>>>>>> document?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am trying to make a web
interface where index can be
> >>>>>>>>>>>>>>> modified,
> >>>>>>>>>>>>> smaller
> >>>>>>>>>>>>>>> subset of what Luke does
but using JSPs and Servlet.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> to use deleteDocument(int
docNum)
> >>>>>>>>>>>>>>> I need docNum how can I
get this? or does it have to
> >>>>>>>>>>>>>>> come
> >>>>>>>>>>>>>>> only vis
> >>>>>>>>>>>>> Hits?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Varun
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ---------------------------------------------------------
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >>>>>>>>>>>>>> unsubscribe@lucene.apache.org
> >>>>>>>>>>>>>> For additional commands, e-mail:
java-user-
> >>>>>>>>>>>>>> help@lucene.apache.org
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> -------------------------------------------------------------
> >>>>>>>>>> --
> >>>>>>>>>> --
> >>>>>>>>>> --
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe, e-mail: java-user-
> >>>>>>>>>> unsubscribe@lucene.apache.org
> >>>>>>>>>> For additional commands, e-mail: java-user-
> >>>>>>>>>> help@lucene.apache.org
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ---------------------------------------------------------------
> >>>>>>>> --
> >>>>>>>> --
> >>>>>>>> --
> >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>>>> For additional commands, e-mail: java-user-
> >>>>>>>> help@lucene.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -----------------------------------------------------------------
> >>>>>> --
> >>>>>> --
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message