lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From blaplante <blapla...@netwebapps.com>
Subject Corrupt index problem
Date Tue, 06 May 2003 15:43:05 GMT
I am not sure if I might be doing something wrong but the result of
searching the index produces a summary field that looks like the following.

%PDF-1.2 %âãÏÓ 10 0 obj << /Length 11 0 R /Filter /FlateDecode >> stream
H?µWÛnÜHýÿC=&?X®«.oÛñeá
¾?ÝN0?_äî²­?ºå?Ôvü÷KÖM¥NË3?Ì"Ð??Å"yxxøy¾'?HDJT?Èüh?üÓ>?½?Jò$?ðß÷{4¡?sø¹
ûø[2øýB>Ü\ÍæÇGäúâdþ}vuLð?

I am using: 
lucene-1.2.jar
PDFBox-0.6.2.jar

I wrote a class that formulates a dynamic url and uses https to connect via
the web and retrieve an InputStream using a GET_METHOD. I am passing the
InputStream off the HTMLParser(InputStream stream) constructor and calling
parser.getReader() to store the content into the index and I am calling
parser.getSummary() to store the summary into the index. When I used the
demo that came with lucene and did a local index on my own machine the
HTMLParser(File file) constructor worked just fine. If anyone has idea's as
to what I might be doing wrong I would appreciate the heads up.

Thanks

Bryan LaPlante

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message