Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C2DFB1088E for ; Thu, 6 Jun 2013 10:25:25 +0000 (UTC) Received: (qmail 36046 invoked by uid 500); 6 Jun 2013 10:25:23 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 35780 invoked by uid 500); 6 Jun 2013 10:25:22 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 35772 invoked by uid 99); 6 Jun 2013 10:25:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2013 10:25:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.128.172] (HELO mail-ve0-f172.google.com) (209.85.128.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2013 10:25:15 +0000 Received: by mail-ve0-f172.google.com with SMTP id jz10so2089377veb.3 for ; Thu, 06 Jun 2013 03:24:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=5+Cdn/ltnYuZcpgc4fxiY/OWZOd8aBp5dme3q9/Vb7M=; b=kQXqayfs7ZWn/Ar7fD+3Vs1yZm7wf+SToXhxQtndkweHcXV8EvLoNzAH6PnIfVR9/y vo21sZh6bU8+gFkI3jIIxtVDnzYGIfmyhfPQ3cOGMzpchgGa2xI5tvUOf8S/cD+JNBlL fJlDctfxGAqIKNG2DOjvXw5BmO7ksrNiXgYj7lux6pEGIFypkZE8T9gkDuZHWGtWMMFS mReni+YIx4hIqP+23t1zKn8XiZmR5HNXt7lc6cjpc9+RmJLWWSpQj1J+UcDX13UrTeSk EF2U3otNg1hwB0+nRYiwGwR0Xq/RRQWJTC4QdmDdgIQ26MdnZFDOoitrvRJE6Ya5sNz5 8uwA== X-Received: by 10.220.87.208 with SMTP id x16mr1481154vcl.9.1370514274792; Thu, 06 Jun 2013 03:24:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.119.206 with HTTP; Thu, 6 Jun 2013 03:24:14 -0700 (PDT) In-Reply-To: References: From: Michael McCandless Date: Thu, 6 Jun 2013 06:24:14 -0400 Message-ID: Subject: Re: postings lists deduplication To: "Lucene/Solr dev" Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQl9k//FMP41KRQ4Q1XToATjiQa/8RGBEq0/A0unGTCXNZktJjycscZ41hvtdPnDM9Bl34YY X-Virus-Checked: Checked by ClamAV on apache.org Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem -> all of its surface forms). Mike McCandless http://blog.mikemccandless.com On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan wrote: > Robert Muir and I have discussed what Robert eventually named "postings > lists deduplication" at bbuzz 2013 conference in Berlin. > > The idea is to allow multiple terms to point to the same postings list to > save space. > > The application / impact of this is positive for synonyms, exact / inexact > terms, leading wildcard support via storing reversed term etc. > > At the moment, when supporting exact (unstemmed) and inexact (stemmed) > searches, we store both unstemmed and stemmed variant of a word form and > that leads to index bloating. For example, we had to remove the leading > wildcard support via reversing a token on index and query time because of > the same index size considerations. > > Would you like a jira for this? > > Thanks, > > Dmitry Kan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org