Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43BBC11584 for ; Mon, 2 Jun 2014 07:22:50 +0000 (UTC) Received: (qmail 93635 invoked by uid 500); 2 Jun 2014 07:22:49 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93568 invoked by uid 500); 2 Jun 2014 07:22:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93558 invoked by uid 99); 2 Jun 2014 07:22:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jun 2014 07:22:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [80.12.242.130] (HELO smtp.smtpout.orange.fr) (80.12.242.130) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jun 2014 07:22:43 +0000 Received: from [192.168.0.13] ([78.243.58.101]) by mwinf5d16 with ME id 9KNK1o00V2B2nGW03KNLeA; Mon, 02 Jun 2014 09:22:20 +0200 X-ME-Helo: [192.168.0.13] X-ME-Auth: b2xpdmllci5iaW5kYUB3YW5hZG9vLmZy X-ME-Date: Mon, 02 Jun 2014 09:22:20 +0200 X-ME-IP: 78.243.58.101 Message-ID: <538C262B.9090908@wanadoo.fr> Date: Mon, 02 Jun 2014 09:22:19 +0200 From: Olivier Binda Reply-To: java-user@lucene.apache.org User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: remapping docIds in a read only offline built index References: <5370F052.2030603@wanadoo.fr> In-Reply-To: <5370F052.2030603@wanadoo.fr> Content-Type: multipart/alternative; boundary="------------000301040709090705030309" X-Virus-Checked: Checked by ClamAV on apache.org --------------000301040709090705030309 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello, I'm still interested in having the answer to the following question : In a 1-segment read-only index (that is built offline once and then frozen), is it possible to remap the docIds ? I may have a (working but not optimal) answer to my original problem : I may use a MultiReader and 3 index to get the following composite index docId : document ------------------------- 1 : bookA 2 : bookB .... M : linkA M+1 : linkB ... N+1 : sentenceA N+2 : sentenceB ... 300000 :sentenceZZZ This solution should be slower that if I only built 1 index while having the docId equal to the order in which I added the documents. On 05/12/2014 06:01 PM, Olivier Binda wrote: > In a 1-segment (parallel) read-only index, that is built offline once > (and then frozen), > is it possible to remap the docIds as the last step (i.e... to have > the exact same index, except that the docIds are all equal to the ord > the docs where added to the index) ? > > Say I have the read only index > > docId : document > 1 : bookB > 2 : sentenceB > 3 : linkA > 4 : linkC > 5 : sentenceC > 6 : sentenceA > 7 : bookA > ... > 300000 : linkD > > I would like to have instead the read-only index > > docId : document > 1 : bookA > 2 : bookB > .... > > M : linkA > M+1: linkB > ... > N+1 : sentenceA > N+2 : sentenceB > ... > 300000:sentenceZZZ > > This would allow me to reduce the amount of ram to cache the type of > each document > > -> without remapping, I need at least log2(types)* documents bits > here 2 * 300000 bits > > -> with remapping, I need only to remember ints M and N > > Also, if I need to cache 1 byte of metadata for each book > > -> without remapping, I would need 1 byte * documents > here 300000 bytes > > -> with remapping, I would only need 1 byte * books > here M - 1 bytes > > > I tried building such an index with > LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I > did something wrong), > the docIds were always reshuffled (maybee because my index was big and > I was over a threshold) > > > > Best regards, > Olivier > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------000301040709090705030309--