Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 53001 invoked from network); 21 Jul 2008 22:19:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Jul 2008 22:19:02 -0000 Received: (qmail 4030 invoked by uid 500); 21 Jul 2008 22:18:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 3987 invoked by uid 500); 21 Jul 2008 22:18:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Delivered-To: moderator for java-dev@lucene.apache.org Received: (qmail 98354 invoked by uid 99); 21 Jul 2008 22:12:18 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cutting@gmail.com designates 74.125.46.29 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding:sender; bh=/wCVTjAd1jIzNO/3TtZh5OH9bJlCfc/0Xv0P3S6dSf4=; b=SiQYxJCckKTLMAvNprO+9SepghxFhvEK7/AIRSFbQ+ijBFutKmfg+ZsLQ6WIz6HNaJ 56Ae+CQoRW8oYeLpnLtZNkuKtntn/HysU/dAuRJ5gAe5q0SEn3oq5OdbaL09M0fnyKJX 5jZNwHtI/K9za2NbMqlHfX57qkj8a85JJiJLE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding:sender; b=qi4J1R9mXrJLuUtNDD7hEhDR/QEgLgPSh+/K4qr+OYXa09AYi8ImdETYvnzRcZl5tj F7Egws031gF5PMtWfGUFSYUK0LwTEU4WE4GLKeYsOAnRCLBd8EKOyeYTMnzInxrDm8vq Uxc+grTC36nfySKThn6Lz8Lmvh+ytzR+wLQDg= Message-ID: <4885099F.8020703@apache.org> Date: Mon, 21 Jul 2008 15:11:43 -0700 From: Doug Cutting User-Agent: Thunderbird 2.0.0.14 (X11/20080505) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary References: <609174.59760.qm@web27101.mail.ukl.yahoo.com> In-Reply-To: <609174.59760.qm@web27101.mail.ukl.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: Doug Cutting X-Virus-Checked: Checked by ClamAV on apache.org This also reminds me of the "pulsing" technique described in: http://citeseer.ist.psu.edu/cutting90optimizations.html Doug eks dev wrote: > It seams someone else had the same idea to "inline" very short postings into term dictionary (even for in-memory index) ans save one pointer (and seek, in disk setup)... nice reading > > http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf > > > > > ----- Original Message ---- >> From: Eks Dev (JIRA) >> To: java-dev@lucene.apache.org >> Sent: Sunday, 20 July, 2008 1:02:31 PM >> Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary >> >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077 >> ] >> >> Eks Dev commented on LUCENE-1278: >> --------------------------------- >> >> in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I >> think it is worth mentioning that I am working on LUCENE-1340, that is storing >> postings without additional frq info. >> >> correct me if I am wrong, the only difference is that this approach with *.frq >> needs one seek more... at the same time, this could potentially increase term >> dict size, so we loose some locality. >> >> Your your last proposal sounds interesting, "inline short postings" into term >> dict , so for short postings (about the size of offset pointer into *.frq) with >> tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340) we >> spare one seek()... this could be a lot. Also, there is no need to store >> postings into *frq (this complicates maintenance I guess) >> >>> Add optional storing of document numbers in term dictionary >>> ----------------------------------------------------------- >>> >>> Key: LUCENE-1278 >>> URL: https://issues.apache.org/jira/browse/LUCENE-1278 >>> Project: Lucene - Java >>> Issue Type: New Feature >>> Components: Index >>> Affects Versions: 2.3.1 >>> Reporter: Jason Rutherglen >>> Priority: Minor >>> Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch, >> lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, >> lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java >>> >>> Add optional storing of document numbers in term dictionary. String index >> field cache and range filter creation will be faster. >>> Example read code: >>> {noformat} >>> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS); >>> do { >>> Term term = termEnum.term(); >>> if (term == null || term.field() != field) break; >>> int[] docs = termEnum.docs(); >>> } while (termEnum.next()); >>> {noformat} >>> Example write code: >>> {noformat} >>> Document document = new Document(); >>> document.add(new Field("tag", "dog", Field.Store.YES, >> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS)); >>> indexWriter.addDocument(document); >>> {noformat} >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > __________________________________________________________ > Not happy with your email address?. > Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org