lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 33397] - problem searching or indexing larger text files
Date Fri, 04 Feb 2005 00:54:51 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=33397>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=33397





------- Additional Comments From lmiller@encryptx.com  2005-02-04 01:54 -------
I'm having problems during indexing where it doesn't index the entire file for 
larger text files. I have attached two text files, each with the word "bozo" 
at the end of the file. After indexing the smaller text file, a search will 
show a hit for the word bozo. After indexing the larger text file, a search 
will not find any hits for the word bozo.

I originally discovered this problem when reading a socket. The socket was 
reading from a file, and the input stream was passed in to the addDocument. 
During the IndexWriter's addDocument() method, it stopped reading from the 
socket - it didn't read all of the contents. I wrote this little test program 
to read straight from a file, and I am seeing the same problem. I don't see 
any errors - it just seems to stop reading.

The smaller text file is ~82 KB, and the larger one is around 86 KB. Here is 
the output I get from running the test program (I have attached the source 
code and both of my test files):

Enter in file name to read:
c:\temp\smaller.txt
c:\temp\smaller.txt is 83649 bytes.
Enter in text to search for:
bozo
Found 1 hits.

Enter in file name to read:
c:\temp\larger.txt
c:\temp\larger.txt is 87183 bytes.
Enter in text to search for:
bozo
Found 0 hits.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message