lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Marius Kirsch <skir...@sebastian-kirsch.org>
Subject Re: managing docids for ParallelReader
Date Sat, 04 Jun 2005 23:06:50 GMT
Dear Doug,

thanks for your message.

On Fri, Jun 03, 2005 at 09:37:01AM -0700, Doug Cutting wrote:
> Sebastian Marius Kirsch wrote:
> >I took up your suggestion to use a ParallelReader for adding more
> >fields to existing documents. I now have two indexes with the same
> >number of documents, but different fields.
> Does search work using the ParalleReader?

I have to admit that I didn't test *that* yet, sorry. I thought that
merging the index into another one would be a good test anyway.

> >One field is duplicated (the id field.)
> Why is this duplicated?  Just curious.  That shouldn't cause a problem.

Just as a precaution, so that I can tell afterwards whether two
indexes are in sync or not. (Iterate over the documents in both
indexes and check whether the id fields match.)

> Why are you merging?  Why not just search using the ParallelReader? 
> Again, just curious.  This should work.

Interoperability. That way, I can later hand the index over to another
application that knows nothing about parallel indexes, and don't have
to make sure that this other application combines the indexes the
right way.

(Oh, and I can use Luke to check the index for plausibility. That's an
important point for me.)

> This could be a bug.  I have not tested merging with a ParallelReader. 
> Can you please try to adding a test case to TestParallelReader that 
> demonstrates this?

I have attached the diff for the test case, and the output of the test
run.

I have played around with the code for a couple of hours, but cannot
find a fix for this. If I change ParallelTermPosition.seek(TermEnum
termEnum) to check for termEnum.term() being null, and then hand this
null over to the correct IndexReader, instead of calling
.seek(termEnum.term()), then I get a different
error. (NullPointerException in
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:140).)
Apparently, TermPositions are not made for seeking to a null.

On the other hand, I don't know where the null is coming from in the
first case. It comes from the termEnum of one of the underlying
IndexReaders, and if that's a problem, it should be a problem outside
of ParallelReader too.

I'm confused (in case you couldn't tell that yet.) I'll try to find
out more tomorrow.

Regards, Sebastian

PS: I'm rather new to this whole Java thing; I tried to import lucene
into Eclipse for easier debugging, but failed. If any of the
developers use Eclipse, I'd be grateful for some hints regarding
this. Thanks for bearing with me.

$ svn diff
Index: src/test/org/apache/lucene/index/TestParallelReader.java
===================================================================
--- src/test/org/apache/lucene/index/TestParallelReader.java    (revision 179785)
+++ src/test/org/apache/lucene/index/TestParallelReader.java    (working copy)
@@ -57,6 +57,13 @@
 
   }
 
+  public void testMerge() throws Exception {
+    Directory dir = new RAMDirectory();
+    IndexWriter w = new IndexWriter(dir, new StandardAnalyzer(), true);
+    w.addIndexes(new IndexReader[] { ((IndexSearcher) parallel).getIndexReader() });
+    w.close();
+  }
+  
   private void queryTest(Query query) throws IOException {
     Hits parallelHits = parallel.search(query);
     Hits singleHits = single.search(query);
$ ant -Dtestcase=TestParallelReader test
Buildfile: build.xml
[...]
test:
    [mkdir] Created dir: /Users/skirsch/text/lectures/da/thirdparty/lucene-trunk/build/test
    [junit] Testsuite: org.apache.lucene.index.TestParallelReader
    [junit] Tests run: 2, Failures: 0, Errors: 1, Time elapsed: 1.993 sec

    [junit] Testcase: testMerge(org.apache.lucene.index.TestParallelReader):   Caused an ERROR
    [junit] null
    [junit] java.lang.NullPointerException
    [junit]     at org.apache.lucene.index.ParallelReader$ParallelTermPositions.seek(ParallelReader.java:318)
    [junit]     at org.apache.lucene.index.ParallelReader$ParallelTermDocs.seek(ParallelReader.java:294)
    [junit]     at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:325)
    [junit]     at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:296)
    [junit]     at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:270)
    [junit]     at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:234)
    [junit]     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
    [junit]     at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:596)
    [junit]     at org.apache.lucene.index.TestParallelReader.testMerge(TestParallelReader.java:63)
    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)


    [junit] Test org.apache.lucene.index.TestParallelReader FAILED

BUILD FAILED
/Users/skirsch/text/lectures/da/thirdparty/lucene-trunk/common-build.xml:188: Tests failed!

Total time: 16 seconds
$

-- 
Sebastian Kirsch <skirsch@sebastian-kirsch.org> [http://www.sebastian-kirsch.org/]

NOTE: New email address! Please update your address book.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message