Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 40677 invoked from network); 2 Sep 2009 06:07:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Sep 2009 06:07:04 -0000 Received: (qmail 94386 invoked by uid 500); 2 Sep 2009 06:07:03 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 94313 invoked by uid 500); 2 Sep 2009 06:07:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 94305 invoked by uid 99); 2 Sep 2009 06:07:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2009 06:07:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2009 06:06:53 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E5339234C04B for ; Tue, 1 Sep 2009 23:06:32 -0700 (PDT) Message-ID: <1730071292.1251871592937.JavaMail.jira@brutus> Date: Tue, 1 Sep 2009 23:06:32 -0700 (PDT) From: "Uwe Schindler (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1881) Non-stored fields are not copied in writer.addDocument()? In-Reply-To: <1252182110.1251807753055.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750284#action_12750284 ] Uwe Schindler commented on LUCENE-1881: --------------------------------------- There is no practical solution for this, indexing is a one-way action and not reversible. Because of this we offer "stored" fields as a store for the orginal or additional information to the indexed documents (e.g. for storing the original strings indexed). Lucene works with an "inverted index" ([http://en.wikipedia.org/wiki/Inverted_index]). During inversion of these non-stored fields (indexed ones), the fields are tokenized (which is a non-reversible action, because stop-words are removed, terms are normalized and so on) and these terms are stored in a global unique list off all terms. The index then only contains the references to the document ids (one-way from term -> document id). For your problem you need to get the list of terms for one document which is not easily possible (there is some possibility to iterate over all terms/docs and try to rebuild the terms for a document, but you never get back the old indexed contents and its very slow. Look into the tool "Luke" for this, which is a GUI for Lucene that has some code to do this). You can only add your already indexed contents to another index using IndexWriter.addIndexes(). In this case they stay searchable but cannot be modified. > Non-stored fields are not copied in writer.addDocument()? > --------------------------------------------------------- > > Key: LUCENE-1881 > URL: https://issues.apache.org/jira/browse/LUCENE-1881 > Project: Lucene - Java > Issue Type: Bug > Components: Store > Affects Versions: 2.4.1 > Environment: Linux > Reporter: Wai Wong > Assignee: Hoss Man > Priority: Critical > > We would like to modified stored documents properties. The method is to use IndexReader to open all files, modified some fields, and copy the document via addDocument() of IndexWriter to another index. But all fields that are created using Field.Store.NO are no longer available for searching. > Sample code in jsp is attached: > <%@ page language="java" import="org.apache.lucene.analysis.standard.StandardAnalyzer;"%> > <%@ page language="java" import="org.apache.lucene.document.*;"%> > <%@ page language="java" import="org.apache.lucene.index.*;"%> > <%@ page language="java" import="org.apache.lucene.search.*;"%> > <%@ page contentType="text/html; charset=utf8" %> > <% > // create for testing > IndexWriter writer = new IndexWriter("/opt/wwwroot/351/Index/test", new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); > Document doc = new Document(); > doc.add(new Field("A", "1234", Field.Store.NO , Field.Index.NOT_ANALYZED)); > doc.add(new Field("B", "abcd", Field.Store.NO , Field.Index.NOT_ANALYZED)); > writer.addDocument(doc); > writer.close(); > // check ok > Query q = new TermQuery(new Term("A", "1234")); > Searcher s = new IndexSearcher("/opt/wwwroot/351/Index/test"); > Hits h = s.search(q); > out.println("# of document found is " + h.length()); // it is ok > // update the document to change or remove a field > IndexReader r = IndexReader.open("/opt/wwwroot/351/Index/test"); > doc = r.document(0); > r.deleteDocument(0); > r.close(); > doc.removeField("B"); > writer = new IndexWriter("/opt/wwwroot/351/Index/test1", new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); > writer.addDocument(doc); > writer.optimize(); > writer.close(); > // test again > s = new IndexSearcher("/opt/wwwroot/351/Index/test1"); > h = s.search(q); > out.println("

# of document found is now " + h.length()); > r = IndexReader.open("/opt/wwwroot/351/Index/test1"); > out.println("

max Doc is " + r.maxDoc()); > %> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org