Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Message-ID: <1730071292.1251871592937.JavaMail.jira@brutus>
Date: Tue, 1 Sep 2009 23:06:32 -0700 (PDT)
From: "Uwe Schindler (JIRA)" <jira@apache.org>
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-1881) Non-stored fields are not copied in
 writer.addDocument()?
In-Reply-To: <1252182110.1251807753055.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/LUCENE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750284#action_12750284 ] 

Uwe Schindler commented on LUCENE-1881:
---------------------------------------

There is no practical solution for this, indexing is a one-way action and not reversible. Because of this we offer "stored" fields as a store for the orginal or additional information to the indexed documents (e.g. for storing the original strings indexed).

Lucene works with an "inverted index" ([http://en.wikipedia.org/wiki/Inverted_index]). During inversion of these non-stored fields (indexed ones), the fields are tokenized (which is a non-reversible action, because stop-words are removed, terms are normalized and so on) and these terms are stored in a global unique list off all terms. The index then only contains the references to the document ids (one-way from term -> document id). For your problem you need to get the list of terms for one document which is not easily possible (there is some possibility to iterate over all terms/docs and try to rebuild the terms for a document, but you never get back the old indexed contents and its very slow. Look into the tool "Luke" for this, which is a GUI for Lucene that has some code to do this).

You can only add your already indexed contents to another index using IndexWriter.addIndexes(). In this case they stay searchable but cannot be modified.

> Non-stored fields are not copied in writer.addDocument()?
> ---------------------------------------------------------
>
>                 Key: LUCENE-1881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1881
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 2.4.1
>         Environment: Linux
>            Reporter: Wai Wong
>            Assignee: Hoss Man
>            Priority: Critical
>
> We would like to modified stored documents properties.  The method is to use IndexReader to open all files, modified some fields, and copy the document via addDocument() of IndexWriter to another index.  But all fields that are created using Field.Store.NO are no longer available for searching.
> Sample code in jsp is attached:
> <%@ page language="java" import="org.apache.lucene.analysis.standard.StandardAnalyzer;"%>
> <%@ page language="java" import="org.apache.lucene.document.*;"%>
> <%@ page language="java" import="org.apache.lucene.index.*;"%>
> <%@ page language="java" import="org.apache.lucene.search.*;"%>
> <%@ page contentType="text/html; charset=utf8" %>
> <%
>     // create for testing
>     IndexWriter writer = new IndexWriter("/opt/wwwroot/351/Index/test", new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
>     Document doc = new Document();
>     doc.add(new Field("A", "1234", Field.Store.NO , Field.Index.NOT_ANALYZED));
>     doc.add(new Field("B", "abcd", Field.Store.NO , Field.Index.NOT_ANALYZED));
>     writer.addDocument(doc);
>     writer.close();
>     // check ok
>     Query q = new TermQuery(new Term("A", "1234"));
>     Searcher s = new IndexSearcher("/opt/wwwroot/351/Index/test");
>     Hits h = s.search(q);
>     out.println("# of document found is " + h.length());        // it is ok
>     // update the document to change or remove a field
>     IndexReader r = IndexReader.open("/opt/wwwroot/351/Index/test");
>     doc = r.document(0);
>     r.deleteDocument(0);
>     r.close();
>     doc.removeField("B");
>     writer = new IndexWriter("/opt/wwwroot/351/Index/test1", new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
>     writer.addDocument(doc);
>     writer.optimize();
>     writer.close();
>     // test again
>     s = new IndexSearcher("/opt/wwwroot/351/Index/test1");
>     h = s.search(q);
>     out.println("<P># of document found is now " + h.length());
>     r = IndexReader.open("/opt/wwwroot/351/Index/test1");
>     out.println("<P> max Doc is " + r.maxDoc());
> %>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org