Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60830 invoked from network); 31 Dec 2008 00:30:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Dec 2008 00:30:44 -0000 Received: (qmail 17854 invoked by uid 500); 31 Dec 2008 00:30:36 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 17815 invoked by uid 500); 31 Dec 2008 00:30:36 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 17804 invoked by uid 99); 31 Dec 2008 00:30:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Dec 2008 16:30:36 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of karl.wettin@gmail.com designates 209.85.219.21 as permitted sender) Received: from [209.85.219.21] (HELO mail-ew0-f21.google.com) (209.85.219.21) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Dec 2008 00:30:28 +0000 Received: by ewy14 with SMTP id 14so6211306ewy.5 for ; Tue, 30 Dec 2008 16:30:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=K2t061Pbci4GFPxmRo5z0e1F03bFCPTiOaAupW3GJ+A=; b=sgSYSW7+H04Feb/AQYk5klIkpYyTXDronlMPjHnUwjnbCqJx1PnESI9HQJrATvmOos N8w3gmP6g7O+AaUnM+yQ2NYyy1h2sUqdk5p10Bg63rt02ISzY7Q7RlgWQLVAtbEqbElX 7HFUHrLuSkBxpYbfQjCBDBNAONqf1xve1NzIs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=ZVMTeMp8WX7S1tx4zVYyMj5iYW+5C4Fni/nN2CQAorMa+T14H2E1BMk4YsSyGkVom4 0J1v6kEEFxcQZodVShfkHn6yX7sNC5v53reTPs709M2RBlH4mA3FBZ9ENvnJuT0Dm7+A xE8aRI2XiMHHYxnC8OUIS+ErgH+7oL/z5cNn0= Received: by 10.210.10.8 with SMTP id 8mr12968954ebj.186.1230683407096; Tue, 30 Dec 2008 16:30:07 -0800 (PST) Received: from c-1ee6e355.029-18-6d6c6d2.cust.bredbandsbolaget.se (c-1ee6e355.029-18-6d6c6d2.cust.bredbandsbolaget.se [85.227.230.30]) by mx.google.com with ESMTPS id 5sm39303724nfv.58.2008.12.30.16.30.06 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 30 Dec 2008 16:30:06 -0800 (PST) Message-Id: From: Karl Wettin To: java-user@lucene.apache.org In-Reply-To: <559869.2560.qm@web112218.mail.gq1.yahoo.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: Extract the text that was indexed Date: Wed, 31 Dec 2008 01:30:05 +0100 References: <410194.56701.qm@web112210.mail.gq1.yahoo.com> <11975c90812300431h31c9719w50a367a2ee80a1c@mail.gmail.com> <359a92830812300741u50ac5c4cq45d5236595fe7c6c@mail.gmail.com> <559869.2560.qm@web112218.mail.gq1.yahoo.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org 30 dec 2008 kl. 17.13 skrev Lebiram: Hi Lebiram, contrib/misc contains a couple of tools that might be of help. > Just wanted to reconstruct a new index based on an existing > index(but turning off norms) that's all. If you want to create an identical index but without norms use FieldNormModifier in contrib/miscellaneous. > However, as it is nearly impossible to extract the terms of > unstored fields, we might think of other ways. Not impossible, just time consuming. The easiest way is to reconstruct the token stream of each field using the term frequency vector. If you haven't stored it there is a class called TermVectorAccessor in contrib/miscellaneous that allows you to visit the term vector even though it is not store, i.e. it will construct it be enumerating the inverted index. Remember that if you reconstruct a token stream via the term vector no payloads will be available. If you use payloads it would be a simple thing to patch TermVectorAccessor in order to set the payloads in the tokens. Feel free to post such a patch in the Jira, it would be a nice addition to that code. karl --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org