Hi Ryan,
I have identified the Simple mistake in WordExtractor code of
textmining. Just have a look at following code.
Method :extractText
// code snippet of extractText method
while (runIt.hasNext())
{
CHPX chpx = (CHPX)runIt.next();
boolean deleted = isDeleted(chpx.getGrpprl());
if (deleted)
{
continue;
}
int runStart = chpx.getStart();
int runEnd = chpx.getEnd();
while (runStart >= currentTextEnd) //possibilty of raising
exceptions
{
currentPiece = (TextPiece) textIt.next (); //because of
this :(
currentTextStart = currentPiece.getStart ();
currentTextEnd = currentPiece.getEnd ();
}
---------------------------------------------------
----------------------------------------------------
>> while (runStart >= currentTextEnd) this line is
mistake
it should be
while (runStart >= currentTextEnd && textIt.hasNext())
otherwise parser may raise exception for certain documents. I
faced problem for atleast 2 documents out of 100 documents
Regards
Sudhakar
=====
"No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925)
"Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)
"It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw (1856-1950)
__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org
|