lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: DocValues and SearcherManager
Date Mon, 23 Oct 2017 10:23:02 GMT
Hmm the document you get back from a reader is NOT the same document you
had indexed, so you cannot retrieve a doc from a reader, tweak it, re-index
it, and hope everything survived.

In particular, your doc values field "id" is not stored, so when you
retrieve it from the reader, there is no id field, and so when you then
replace that document, the new document has no id field.

You could just add a StoredField("id", id) and then the id is there, but it
will be a simple stored field, not a doc values field.  You must then build
a new Document instance for indexing, where you convert that id back into a
doc values field.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 20, 2017 at 5:28 AM, Chris and Helen Bamford <chris@bammers.net>
wrote:

> Hi,
>
> I am using Lucene 4.10.3 and have a problem retrieving a DocValue field of
> a document using SearcherManager after I have updated a stored field value.
>
> The document has two key values: 'state' (stored Field) and 'id'
> (BinaryDocValue).
>
> After the document is indexed, it undergoes the following chain of events:
>
>  - it is retrieved from the index by 'state' (using a Searcher obtained by
> SearcherManager.maybeRefresh() & searcherManager.acquire())
>  - the 'state' field's value is changed and the document is updated using
> the IndexWriter from the SearcherManager (indexWriter.updateDocument(Term,
> Document))
>
> This all works fine.
>
> The problem comes when I want to match on the docValue 'id' reusing the
> same Searcher (SearcherManager.maybeRefresh() +
> searcherManager.acquire()), which does not work.
>
> I'm no expert but it seems that when the document is retrieved by 'state'
> it has only stored fields in the list, so when updated it ends up calling
> FieldInfos.addOrUpdate that discards FieldInfo of the docValue field 'myId'
> from the list. Afterwards it is impossible to retrieve the docValue using
> the same searcher (searcherManager.maybeRefresh() +
> searcherManager.acquire()).
>
> If a new reader is obtained the docValue match/update is possible but this
> is a performance critical piece of code and I was hoping to reuse the same
> SearcherManager.
>
> The unit test here shows the problem:
>
> public class SearcherManagerFailureTest {
>     private static final String indexPath = "/tmp/mytestindex";
>
>     private IndexWriter indexWriter;
>     private SearcherManager searcherManager;
>     public Directory directory;
>
>     @Before
>     public void beforeTest() throws Exception {
>         // Setup
>         directory = FSDirectory.open(new File(indexPath));
>         IndexWriterConfig idxCfg = new IndexWriterConfig(Version.LUCENE_4_10_3,
> new WhitespaceAnalyzer());
> idxCfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
>         indexWriter = new IndexWriter(directory, idxCfg);
>         searcherManager = new SearcherManager(indexWriter, true, new
> SearcherFactory());
>     }
>
>     @After
>     public void afterTest() throws Exception {
>         if (indexWriter != null) {
>             indexWriter.commit();
>             indexWriter.close();
>         }
>
>         if (searcherManager != null) {
>             searcherManager.close();
>         }
>
>         if (directory != null) {
>             directory.close();
>         }
>
>         for(File file: new File(indexPath).listFiles())
>             if (!file.isDirectory())
>                 file.delete();
>     }
>
>     @Test
>     public void TestSearcherManagerFails() throws Exception{
>
>         //Indexing
>         Document doc = new Document();
>         FieldType ft = new FieldType(TextField.TYPE_STORED);
>         ft.setTokenized(false);
>         doc.add(new Field("docId",  "doc1",   ft));
>         doc.add(new Field("state",   "added",   ft));
>         doc.add(new BinaryDocValuesField("id", new BytesRef("first")));
>         indexWriter.addDocument(doc);
>         indexWriter.commit();
>
>         //Search by state
>         searcherManager.maybeRefresh();
>         IndexSearcher searcher = searcherManager.acquire();
>         TopDocs topDocs = searcher.search(new TermQuery(new Term("state",
> "added")), null, 1);
>         Document indexedDoc = searcher.doc(topDocs.scoreDocs[0].doc);
>
>         //Update document
>         String docId = indexedDoc.get("docId");
>         Term term = new Term("docId", docId);
>         Field stateField = (Field) indexedDoc.getField("state");
>         stateField.setStringValue("processed");
>         indexWriter.updateDocument(term, indexedDoc);
>
>         //Try get docValue
>         searcherManager.maybeRefresh();
>         IndexSearcher newSearcher = searcherManager.acquire();
>
>         BinaryDocValues docValues = MultiDocValues.getBinaryValues(newSearcher.getIndexReader(),
> "id");
>         Assert.assertEquals(null, docValues);
>
>         Directory newDirectory = FSDirectory.open(new File(indexPath));
>         BinaryDocValues docValues2 = MultiDocValues.getBinaryValues
> (DirectoryReader.open(newDirectory), "id");
>         Assert.assertNotSame(null, docValues2);
>
>         if(newDirectory != null){
>             newDirectory.close();
>         }
>     }
> }
>
>
> Can anyone advise?
>
> Thanks
>
> - Chris
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message