lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel B. Davis" <...@smart.net>
Subject Intermediate indexing before final
Date Sun, 15 Feb 2004 20:55:36 GMT

I am a newbie to Lucene, and have been learning by experiment
and from the demos.   A problem has arisen in indexing
a document after creation, and before indexing in the
permanent index. It is being indexed to this small lookaside
index in order to determine whether it is
"sponsored" [i.e. contains any word that causes it to
be included in one of the 'sponsored' document levels.]
(A separate letter deals with the larger issues of
sponsorship.)  If it is sponsored, then a setBoost for
the document will be issued, with a level-dependent
value.

The code in question arises from within IndexHTML
near:
	doc = new HTMLDocument(file);
	writer.addDocument(doc);

In the case at issue, this code has been changed to:
	doc = new HTMLDocument(file);
         int boost = sponsoredValue(doc);
         doc.setBoost(boost);
	writer.addDocument(doc);

The sponsoredValue method never returns.

The exception occurs after a longish delay in
eclipse, about 2-3 seconds.  The document used is:
           http://www.w3.org/TR/xquery
stored as a local file. The same document indexes
correctly when the call to sponsoredValue and setBoost
are removed.

HTMLDocument was modified in minor ways.  HTMLParser
is destined for modification, but is still vanilla.

Note that altering RAMDirectory to FSDirectory makes
no difference and does not change the behavior.

I greatly Appreciate any help, thank you all.

  -------------------------------------------------

the Document doc:
       url: Keyword, string
       file: Unindexed, string
       modified: Keyword, string
       uid: as in HTMLdemo, string
       contents: Text, reader
       title: Text, string
       metadata: Text, string

the code:

   private static RAMDirectory ramDir = null;
   private static IndexWriter ramWriter = null;
   private static IndexReader ramReader = null;
   private static IndexSearcher ramSearcher = null;

   public int sponsoredValue(Document doc) {
       .
       .
       .
       ramDir = new RAMDirectory();
       ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true);
+-->  ramWriter.addDocument(doc);
|     ramWriter.close();
|     ramWriter = null;
|     ramReader = IndexReader.open(ramDir);
|     ramSearcher = new IndexSearcher(ramReader);
|     .
|     .
|     .
|     }
|
the Exception:

java.io.IOException: Pipe closed
	at java.io.PipedInputStream.receive(Unknown Source)
	at java.io.PipedInputStream.receive(Unknown Source)
	at java.io.PipedOutputStream.write(Unknown Source)
	at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
	at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source)
	at sun.nio.cs.StreamEncoder.write(Unknown Source)
	at sun.nio.cs.StreamEncoder.write(Unknown Source)
	at java.io.OutputStreamWriter.write(Unknown Source)
	at java.io.Writer.write(Unknown Source)
	at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141)
	at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200)
	at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69)







---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message