Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 90577 invoked from network); 17 Mar 2008 12:12:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Mar 2008 12:12:05 -0000 Received: (qmail 37697 invoked by uid 500); 17 Mar 2008 12:11:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37654 invoked by uid 500); 17 Mar 2008 12:11:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37643 invoked by uid 99); 17 Mar 2008 12:11:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2008 05:11:55 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.233.184.228] (HELO wr-out-0506.google.com) (64.233.184.228) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2008 12:11:16 +0000 Received: by wr-out-0506.google.com with SMTP id c30so3760984wra.14 for ; Mon, 17 Mar 2008 05:08:36 -0700 (PDT) Received: by 10.141.62.15 with SMTP id p15mr32563rvk.159.1205755714305; Mon, 17 Mar 2008 05:08:34 -0700 (PDT) Received: from ?10.17.4.4? ( [72.93.214.93]) by mx.google.com with ESMTPS id 9sm28512805wrl.31.2008.03.17.05.08.33 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 17 Mar 2008 05:08:34 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: <1bcb7c7f0803170448v6f040e69r6f325585ac7d76f7@mail.gmail.com> References: <800d6e9c0803121359x97a8b18i78423eb3217323ed@mail.gmail.com> <594DD2DE-B63A-410F-BB76-771A1F1BD909@mikemccandless.com> <1bcb7c7f0803170150r7dcdd149kb80fe4436622e7d7@mail.gmail.com> <53F056C0-74F5-4DB4-A81D-0204365AAC4C@mikemccandless.com> <1bcb7c7f0803170216v453fe2dfxb0dbd73a0ebe9b64@mail.gmail.com> <57C95CB4-0AC6-4B90-9706-48B6F56CFD94@mikemccandless.com> <1bcb7c7f0803170318q51d490e8tb6f2233d97f6d208@mail.gmail.com> <9A4EDD8C-E861-468F-BBB5-8BF90903D785@mikemccandless.com> <1bcb7c7f0803170331u6fa8817buaecb8d83f64799f5@mail.gmail.com> <1bcb7c7f0803170448v6f040e69r6f325585ac7d76f7@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <8DE86125-1DC1-4BD1-AF6C-233CEA7F4069@mikemccandless.com> Content-Transfer-Encoding: 7bit From: Michael McCandless Subject: Re: IndexReader deleteDocument Date: Mon, 17 Mar 2008 08:08:41 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.753) X-Virus-Checked: Checked by ClamAV on apache.org I think that is quite a ways away. This possibility was briefly mentioned on the java-dev list recently, to create an IndexReader that can access the in-memory buffered adds/ deletes in IndexWriter, but it would be a very large change for Lucene. Various caches assume an index will not change, once opened. That said, there is work being done to overhaul how FieldCache and norms work so as to greatly reduce the cost of 1) initially populating the FieldCache, and 2) updating only the portions of the FieldCache that were "dirtied" by a re-open. So I think near term, making reopen faster is the priority and really is a necessary first step towards someday being able to have a "live" reader. Mike Cam Bazz wrote: > Hello Mike, > > Is there any hope for making a lucene index that is fully > transparent, i.e. > the indexreader seeing all the changes without reopening? > > Best. > > On Mon, Mar 17, 2008 at 12:35 PM, Michael McCandless < > lucene@mikemccandless.com> wrote: > >> >> Oh, sorry, no you still must reopen the IndexReader. IndexReader >> still searches only a point in time. >> >> Mike >> >> Cam Bazz wrote: >> >>> yes, I meant the same index. >>> >>> I thought with the new changes - the index reader would see the >>> changes >>> without re-opening. >>> It would be real real cool to have that. >>> >>> >>> Best. >>> >>> -C.B. >>> >>> On Mon, Mar 17, 2008 at 12:28 PM, Michael McCandless < >>> lucene@mikemccandless.com> wrote: >>> >>>> >>>> I'm not sure what you mean by "same thread". Maybe you meant "same >>>> index"? >>>> >>>> Yes, if the IndexReader reopens. >>>> >>>> IndexWriter.commit() makes the changes visible to readers, and >>>> makes >>>> the changes durable to os/computer crash or power outage. >>>> >>>> Mike >>>> >>>> Cam Bazz wrote: >>>> >>>>> Another and last question; >>>>> >>>>> when the user commits, will an indexreader that is reading the >>>>> same >>>>> thread >>>>> see the changes made or not? >>>>> >>>>> I thought something was said about this, if my memory serves me >>>>> correct. >>>>> >>>>> Best. >>>>> >>>>> On Mon, Mar 17, 2008 at 11:53 AM, Michael McCandless < >>>>> lucene@mikemccandless.com> wrote: >>>>> >>>>>> >>>>>> It's a hard drive issue. When you call fsync, the OS asks the >>>>>> hard >>>>>> drive to sync. >>>>>> >>>>>> Mike >>>>>> >>>>>> Cam Bazz wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I understand the issue. But I have not understood - is this >>>>>>> hardware related >>>>>>> issue - i.e a harddisk? or operating system? >>>>>>> >>>>>>> If I am using linux would the OS lie about fsyncing? could I do >>>>>>> anything in >>>>>>> the kernel to stop it from lying? or is this just a harddrive >>>>>>> related >>>>>>> issue... >>>>>>> >>>>>>> Best. >>>>>>> >>>>>>> On Mon, Mar 17, 2008 at 11:12 AM, Michael McCandless < >>>>>>> lucene@mikemccandless.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> When you write to a file, modern OSs by default just buffer >>>>>>>> those >>>>>>>> writes in memory rather than actually writing them >>>>>>>> immediately to >>>>>>>> disk. Modern hard drives do the same (so, after the OS >>>>>>>> flushes to >>>>>>>> the hard drive, the hard drive actually just buffers the >>>>>>>> writes, >>>>>>>> too). Then, when it's a good time, these buffered writes are >>>>>>>> spooled >>>>>>>> to disk in the background. They do this to get better >>>>>>>> performance on >>>>>>>> write. >>>>>>>> >>>>>>>> Then, the fsync() call, which is an OS level call, requests >>>>>>>> that >>>>>>>> all >>>>>>>> buffered bytes be flushed to the real underlying storage >>>>>>>> ("stable >>>>>>>> storage"). It is not supposed to return until all written >>>>>>>> bytes >>>>>>>> are >>>>>>>> on stable storage. Lucene relies on this by fsync'ing all >>>>>>>> referenced >>>>>>>> files in the index, before deleting the files referenced by >>>>>>>> previous >>>>>>>> commits. So, as of 2.4, this ensures the index will remain >>>>>>>> consistent even if the OS or computer crashes, or power is cut. >>>>>>>> >>>>>>>> Unfortunately, there are apparently some devices which even >>>>>>>> when >>>>>>>> fsync >>>>>>>> () is called, return immediately even though the bytes are not >>>>>>>> actually written to stable storage. If you have such a device >>>>>>>> that >>>>>>>> lies then Lucene 2.4 won't be able to guarantee index >>>>>>>> consistency on >>>>>>>> crash/power outage. >>>>>>>> >>>>>>>> Mike >>>>>>>> >>>>>>>> Cam Bazz wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> What do you mean by IO system lying on fsync? >>>>>>>>> >>>>>>>>> Best. >>>>>>>>> >>>>>>>>> On Mon, Mar 17, 2008 at 10:40 AM, Michael McCandless < >>>>>>>>> lucene@mikemccandless.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes that's already been committed to trunk as well. >>>>>>>>>> >>>>>>>>>> IndexWriter now has a commit() method which syncs all >>>>>>>>>> referenced >>>>>>>>>> files in the index to stable storage (assuming your IO system >>>>>>>>>> doesn't >>>>>>>>>> "lie" on fsync). >>>>>>>>>> >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>> On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote: >>>>>>>>>> >>>>>>>>>>> Nice. Thanks. >>>>>>>>>>> >>>>>>>>>>> will the 2.4 have commit improvements that we previously >>>>>>>>>>> talked >>>>>>>>>>> about? >>>>>>>>>>> >>>>>>>>>>> best regards. >>>>>>>>>>> >>>>>>>>>>> -C.B. >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 17, 2008 at 10:31 AM, Michael McCandless < >>>>>>>>>>> lucene@mikemccandless.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The trunk version of Lucene (eventually 2.4) now has >>>>>>>>>>>> deletion by >>>>>>>>>>>> query, in IndexWriter. >>>>>>>>>>>> >>>>>>>>>>>> Mike >>>>>>>>>>>> >>>>>>>>>>>> Cam Bazz wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello Erick, >>>>>>>>>>>>> >>>>>>>>>>>>> Has anyone found a way for deleting a document with a >>>>>>>>>>>>> query? I >>>>>>>>>>>>> understand it >>>>>>>>>>>>> can be deleted via terms, but I need to delete a document >>>>>>>>>>>>> with two >>>>>>>>>>>>> terms, >>>>>>>>>>>>> that is the only way I can identify my document is by >>>>>>>>>>>>> looking at >>>>>>>>>>>>> two terms >>>>>>>>>>>>> not one. >>>>>>>>>>>>> >>>>>>>>>>>>> best. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 14, 2008 at 4:58 PM, Erick Erickson >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Doc IDs are assigned at index time and can change over >>>>>>>>>>>>>> time >>>>>>>>>>>>>> That >>>>>>>>>>>>>> is, >>>>>>>>>>>>>> deleting >>>>>>>>>>>>>> a document and optimizing (and other operations) can and >>>>>>>>>>>>>> will >>>>>>>>>>>>>> change >>>>>>>>>>>>>> document IDs. So, yes, you have to do a search (either >>>>>>>>>>>>>> use a >>>>>>>>>>>>>> hits >>>>>>>>>>>>>> object >>>>>>>>>>>>>> or one of the HitCollectors) in order to delete by doc >>>>>>>>>>>>>> ID. >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can also delete by terms, see the API. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are other options, but you haven't explianed what >>>>>>>>>>>>>> you're >>>>>>>>>>>>>> trying to accomplish enough to offer any more >>>>>>>>>>>>>> suggestions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best >>>>>>>>>>>>>> Erick >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:44 PM, varun sood >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> No. I haven't but I will. even though I would like to >>>>>>>>>>>>>>> make my >>>>>>>>>>>>>>> own >>>>>>>>>>>>>>> implementation. So any idea of how to get the "doc num"? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for replying. >>>>>>>>>>>>>>> Varun >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Mar 12, 2008 at 5:15 PM, Mark Miller >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Have you seen the work that Mark Harwood has done >>>>>>>>>>>>>>>> making a >>>>>>>>>>>>>>>> GWT >>>>>>>>>>>>>>>> version >>>>>>>>>>>>>>>> of Luke? I think its in the latest release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> varun sood wrote: >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> I am trying to delete a document without using the >>>>>>>>>>>>>>>>> hits >>>>>>>>>>>>>>>>> object. >>>>>>>>>>>>>>>>> What is the unique field in the index that I can >>>>>>>>>>>>>>>>> use to >>>>>>>>>>>>>>>>> delete the >>>>>>>>>>>>>>>> document? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am trying to make a web interface where index can be >>>>>>>>>>>>>>>>> modified, >>>>>>>>>>>>>>> smaller >>>>>>>>>>>>>>>>> subset of what Luke does but using JSPs and Servlet. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> to use deleteDocument(int docNum) >>>>>>>>>>>>>>>>> I need docNum how can I get this? or does it have to >>>>>>>>>>>>>>>>> come >>>>>>>>>>>>>>>>> only vis >>>>>>>>>>>>>>> Hits? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Varun >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------------------------------- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user- >>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org >>>>>>>>>>>>>>>> For additional commands, e-mail: java-user- >>>>>>>>>>>>>>>> help@lucene.apache.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----------------------------------------------------------- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe, e-mail: java-user- >>>>>>>>>>>> unsubscribe@lucene.apache.org >>>>>>>>>>>> For additional commands, e-mail: java-user- >>>>>>>>>>>> help@lucene.apache.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> To unsubscribe, e-mail: java-user- >>>>>>>>>> unsubscribe@lucene.apache.org >>>>>>>>>> For additional commands, e-mail: java-user- >>>>>>>>>> help@lucene.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------- >>>>>>>> -- >>>>>>>> -- >>>>>>>> -- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user- >>>>>>>> help@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> -- >>>>>> -- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> -- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org