lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1881) Non-stored fields are not copied in writer.addDocument()?
Date Wed, 02 Sep 2009 06:06:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750284#action_12750284
] 

Uwe Schindler commented on LUCENE-1881:
---------------------------------------

There is no practical solution for this, indexing is a one-way action and not reversible.
Because of this we offer "stored" fields as a store for the orginal or additional information
to the indexed documents (e.g. for storing the original strings indexed).

Lucene works with an "inverted index" ([http://en.wikipedia.org/wiki/Inverted_index]). During
inversion of these non-stored fields (indexed ones), the fields are tokenized (which is a
non-reversible action, because stop-words are removed, terms are normalized and so on) and
these terms are stored in a global unique list off all terms. The index then only contains
the references to the document ids (one-way from term -> document id). For your problem
you need to get the list of terms for one document which is not easily possible (there is
some possibility to iterate over all terms/docs and try to rebuild the terms for a document,
but you never get back the old indexed contents and its very slow. Look into the tool "Luke"
for this, which is a GUI for Lucene that has some code to do this).

You can only add your already indexed contents to another index using IndexWriter.addIndexes().
In this case they stay searchable but cannot be modified.

> Non-stored fields are not copied in writer.addDocument()?
> ---------------------------------------------------------
>
>                 Key: LUCENE-1881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1881
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 2.4.1
>         Environment: Linux
>            Reporter: Wai Wong
>            Assignee: Hoss Man
>            Priority: Critical
>
> We would like to modified stored documents properties.  The method is to use IndexReader
to open all files, modified some fields, and copy the document via addDocument() of IndexWriter
to another index.  But all fields that are created using Field.Store.NO are no longer available
for searching.
> Sample code in jsp is attached:
> <%@ page language="java" import="org.apache.lucene.analysis.standard.StandardAnalyzer;"%>
> <%@ page language="java" import="org.apache.lucene.document.*;"%>
> <%@ page language="java" import="org.apache.lucene.index.*;"%>
> <%@ page language="java" import="org.apache.lucene.search.*;"%>
> <%@ page contentType="text/html; charset=utf8" %>
> <%
>     // create for testing
>     IndexWriter writer = new IndexWriter("/opt/wwwroot/351/Index/test", new StandardAnalyzer(),
true, IndexWriter.MaxFieldLength.LIMITED);
>     Document doc = new Document();
>     doc.add(new Field("A", "1234", Field.Store.NO , Field.Index.NOT_ANALYZED));
>     doc.add(new Field("B", "abcd", Field.Store.NO , Field.Index.NOT_ANALYZED));
>     writer.addDocument(doc);
>     writer.close();
>     // check ok
>     Query q = new TermQuery(new Term("A", "1234"));
>     Searcher s = new IndexSearcher("/opt/wwwroot/351/Index/test");
>     Hits h = s.search(q);
>     out.println("# of document found is " + h.length());        // it is ok
>     // update the document to change or remove a field
>     IndexReader r = IndexReader.open("/opt/wwwroot/351/Index/test");
>     doc = r.document(0);
>     r.deleteDocument(0);
>     r.close();
>     doc.removeField("B");
>     writer = new IndexWriter("/opt/wwwroot/351/Index/test1", new StandardAnalyzer(),
true, IndexWriter.MaxFieldLength.LIMITED);
>     writer.addDocument(doc);
>     writer.optimize();
>     writer.close();
>     // test again
>     s = new IndexSearcher("/opt/wwwroot/351/Index/test1");
>     h = s.search(q);
>     out.println("<P># of document found is now " + h.length());
>     r = IndexReader.open("/opt/wwwroot/351/Index/test1");
>     out.println("<P> max Doc is " + r.maxDoc());
> %>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message