lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Per Lindberg" <>
Subject OutOfMemoryError tokenizing a boring text file
Date Fri, 31 Aug 2007 14:15:56 GMT
I'm creating a tokenized "content" Field from a plain text file
using an InputStreamReader and new Field("content", in);

The text file is large, 20 MB, and contains zillions lines,
each with the the same 100-character token.

That causes an OutOfMemoryError.

Given that all tokens are the *same*,
why should this cause an OutOfMemoryError?
Shouldn't StandardAnalyzer just chug along
and just note "ho hum, this token is the same"?
That shouldn't take too much memory.

Or have I missed something?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message