lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Streeton" <mike.stree...@ardentia.co.uk>
Subject RE: Index Rows as Documents? Help me design a solution
Date Wed, 26 Jul 2006 08:09:56 GMT
The only way you might get the performance you want is to have multiple
IndexWriters writing to different indexes and then addAll are the end.
You would obviously have to handle the multi threading and distribution
of the parts of the log to each writer.

Mike

www.ardentia.com the home of NetSearch

-----Original Message-----
From: Doron Cohen [mailto:DORONC@il.ibm.com] 
Sent: 25 July 2006 22:23
To: java-user@lucene.apache.org
Subject: Re: Index Rows as Documents? Help me design a solution

Few comments -

> (from first posting in this thread)
> The indexing was taking much more than minutes for a 1 MB log file.
...
> I would expect to be able to index at least a of GB of logs within 1
or 2
minutes.

1-2 minutes per GB would be 30-60 GB/Hour, which for a single
machine/jvm
is a lot - well at least I did not see Lucene index this fast.

> doc.add(new Field("msisdn", columns[0], Field.Store.YES,
Field.Index.TOKENIZED));
> doc.add(new Field("messageid", columns[2], Field.Store.YES,
Field.Index.TOKENIZED));

Is it really required to analyze the text for these fields - "msisdn" ,
"
messageid"?

> doc.add(new Field("line", line, Field.Store.YES, Field.Index.NO));

This is storing the original text of all input lines that are indexed -
quite an overhead.

- Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message