lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Index Rows as Documents? Help me design a solution
Date Tue, 25 Jul 2006 21:23:18 GMT
Few comments -

> (from first posting in this thread)
> The indexing was taking much more than minutes for a 1 MB log file. ...
> I would expect to be able to index at least a of GB of logs within 1 or 2
minutes.

1-2 minutes per GB would be 30-60 GB/Hour, which for a single machine/jvm
is a lot - well at least I did not see Lucene index this fast.

> doc.add(new Field("msisdn", columns[0], Field.Store.YES,
Field.Index.TOKENIZED));
> doc.add(new Field("messageid", columns[2], Field.Store.YES,
Field.Index.TOKENIZED));

Is it really required to analyze the text for these fields - "msisdn" , "
messageid"?

> doc.add(new Field("line", line, Field.Store.YES, Field.Index.NO));

This is storing the original text of all input lines that are indexed -
quite an overhead.

- Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message