Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 18291 invoked from network); 9 Sep 2008 16:42:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Sep 2008 16:42:25 -0000 Received: (qmail 60942 invoked by uid 500); 9 Sep 2008 16:42:05 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 60895 invoked by uid 500); 9 Sep 2008 16:42:05 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 60884 invoked by uid 99); 9 Sep 2008 16:42:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Sep 2008 09:42:05 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.92.27] (HELO qw-out-2122.google.com) (74.125.92.27) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Sep 2008 16:41:05 +0000 Received: by qw-out-2122.google.com with SMTP id 5so276649qwi.53 for ; Tue, 09 Sep 2008 09:41:35 -0700 (PDT) Received: by 10.214.241.14 with SMTP id o14mr13911671qah.92.1220978495001; Tue, 09 Sep 2008 09:41:35 -0700 (PDT) Received: from ?10.17.4.4? ( [96.237.252.30]) by mx.google.com with ESMTPS id 58sm9262383rnw.7.2008.09.09.09.41.33 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 09 Sep 2008 09:41:34 -0700 (PDT) Message-Id: <6C7CD841-45E1-4CA1-A71A-169651587EA9@mikemccandless.com> From: Michael McCandless To: java-dev@lucene.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v926) Subject: Re: Realtime Search for Social Networks Collaboration Date: Tue, 9 Sep 2008 12:41:32 -0400 References: <163962.64981.qm@web26004.mail.ukl.yahoo.com> <85d3c3b60809080543r3945931i380c8e3ecee71407@mail.gmail.com> <48C53546.40308@gmail.com> <85d3c3b60809080736m7d651c06x71dfc4341c24c58e@mail.gmail.com> <56742D78-F300-4DCB-93E7-E6A2CD4B66FF@mikemccandless.com> X-Mailer: Apple Mail (2.926) X-Virus-Checked: Checked by ClamAV on apache.org Yonik Seeley wrote: > On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless > wrote: >> Yonik Seeley wrote: >>> What about something like term freq? Would it need to count the >>> number of docs after the local maxDoc or is there a better way? >> >> Good question... >> >> I think we'd have to take a full copy of the term -> termFreq on >> reopen? I >> don't see how else to do it (I don't understand your suggestion >> above). So, >> this will clearly add to the cost of reopen. > > One could adjust the freq by iterating over the terms documents... > skipTo(localMaxDoc) and count how many are after that, then subtract > from the freq. I didn't say it was a *good* idea :-) Ahh, OK :) >>>> For reading stored fields and term vectors, which are now flushed >>>> immediately to disk, we need to somehow get an IndexInput from the >>>> IndexOutputs that IndexWriter holds open on these files. Or, >>>> maybe, just >>>> open new IndexInputs? >>> >>> Hmmm, seems like a case of our nice and simple Directory model not >>> having quite enough features in this case. >> >> I think we can simply open IndexInputs on these files. I believe >> Java does >> the right thing on windows, such that if we are already writing to >> the file, >> it does not prevent another file handle from opening the file for >> reading. > > Yeah, I think the underlying RandomAccessFile might do the right > thing, but IndexInput isn't required to see any changes on the fly > (and current implementations don't) so at a minimum it would be a > change of IndexInput semantics. Maybe there would need to be a > refresh() function added, or we would need to require a specific > Directory impl? > > OR, if all writes are append-only, perhaps we don't ever need to > invalidate the read buffer and would just need to remove the current > logic that caches the file length and then let the underlying > RandomAccessFile do the EOF checking. All writes to these files are append only, and, when we open the IndexInput we would never read beyond it's current length (once we flush our IndexOutput) because that's the local maxDocID limit. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org