lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 27354] - When indexing large files (like the j2sdk API), get java.io.IOException: Pipe closed
Date Tue, 02 Mar 2004 16:31:39 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27354>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27354

When indexing large files (like the j2sdk API), get java.io.IOException: Pipe closed





------- Additional Comments From Eric.Isakson@sas.com  2004-03-02 16:31 -------
This has to do with the interaction between HTMLParser and DocumentWriter. 
HTMLParser does:

  public Reader getReader() throws IOException {
    if (pipeIn == null) {
      pipeInStream = new MyPipedInputStream();
      pipeOutStream = new PipedOutputStream(pipeInStream);
      pipeIn = new InputStreamReader(pipeInStream);
      pipeOut = new OutputStreamWriter(pipeOutStream);

      Thread thread = new ParserThread(this);
      thread.start();                             // start parsing
    }

    return pipeIn;
  }

    return pipeIn;
  }

When you create your Document, you do something like doc.add(Field.Text
("content",parser.getReader()) to supply the reader to the document.

When the DocumentWriter finally uses the reader in invertDocument(Document), it 
does:

          // Tokenize field and add to postingTable
          TokenStream stream = analyzer.tokenStream(fieldName, reader);
          try {
            for (Token t = stream.next(); t != null; t = stream.next()) {
              position += (t.getPositionIncrement() - 1);
              addPosition(fieldName, t.termText(), position++);
              if (++length > maxFieldLength) break;
            }
          } finally {
            stream.close();
          }

if DocumentWriter breaks due to the position > maxFieldLength, the reader gets 
closed before the parser thread is done with it and the next time the parser 
thread writes to the pipe, you get the pipe closed IOException.

Since DocumentWriter doesn't know it is dealing with a pipe being written to on 
some other thread, it can't really deal with this.

At the end of org.apache.lucene.demo.html.ParserThread.run() the stack trace is 
dumped:

    } catch (IOException e) {
	e.printStackTrace();
    }

and since this isn't a more specific exception, just an IOException, we can't 
catch it and silently ignore the case of the pipe being closed.

The message is a bit annoying. I suppose to "fix" this, you could modify 
HTMLParser.MyPipedInputStream and override the close method to do whatever is 
necessary to gracefully shut down the parse (or you might have to extend the 
InputStreamReader and override its close, not sure). I'm not really sure what 
one needs to do to gracefully shut down that parse thread. I also see this 
message in my logs and at present have just been ignoring it. If I get some 
free time I'll try and look at this, OTOH, if the description helps someone 
else see the answer to fixing this, have at it.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message