lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danil Ε’ORIN <torin...@gmail.com>
Subject Re: Questions about Lucene usage recommendations
Date Mon, 27 Sep 2010 12:52:56 GMT
Lucene 2.1 is really old...you should be able to migrate to lucene 2.9
without changing your code (almost jar drop-in, but be careful on
analyzers), and there could be huge improvements if you use lucene
properly.

Few questions:
- what does "all data to be indexed is stored in DB fields" mean? you
should store in lucene everything you need, so on search time you
shouldn't need to hit DB
- what does "indexing is done right after every modification" mean? do
you index just the changed document? or reindex all 1.4 M docs?
- do you sort on some field, or just basic relevance?
- how big is each document and what do you do with it? maybe
highlighting on large documents causes this?
- what's in the document? if it's like a book...and almost every word
matches every document...there could be some issues
- is the lucene the bottleneck? maybe you are calling from remote
server and marshaling+network+unmarshaling is slow?

Usual lucene patterns are (assuming that you moved to lucene 2.9):
- avoid using NFS (not necessary a performance bottleneck, and
definitely not something to cause 2 minutes query, but just to be on
the safe side)
- keep writer open and add documents to the index (no need to rebuild
everything)
- keep your readers open and use reopen() once in a while (you may
even go for realtime search if you want to)
- in your case, I don't think optimize will do any good, segments look
good to me, don't worry about cfs file size
- there are ways to limit cfs files and play with setMaxXXX, but i
don't think it's the cause of your 2 minute query.

On Mon, Sep 27, 2010 at 14:35, Pawlak Michel (DCTI)
<michel.pawlak@etat.ge.ch> wrote:
> Hello,
>
> We have an application which is using lucene and we have strong
> performance issues (on bad days, some searches take more than 2
> minutes). I'm new to the Lucene component, thus I'm not sure Lucene is
> correctly used and thus would like to have some information on lucene
> usage recommendations. This would help locate the problem (application
> code / lucene configuration / hardware / all) It would be great if a
> project committer / specialist could answer those questions.
>
> First some facts about the application :
> - Lucene version being used : 2.1.0 (february 2007...)
> - around 1.4M "documents" to be indexed.
> - Db size (all data to be indexed is stored in DB fields) : 3.5 GB
> - Index file size on disk : 1.6 GB (note that one cfs file is 780M,
> another one is 600M, the rest consists of smaller files)
> - single indexer, multiple readers (6 readers)
> - around 150 documents are modified per day
> - indexing is done right after every modification
> - simple searches can take ages (for instance searching for "chocolate"
> could take for more than 2 minutes)
> - I do not have access to source code (yes that's the funny part)
>
> My questions :
> - Is this version of Lucene still supported ?
> - What are the main reasons, if any, one should use the latest version
> of lucene instead of 2.1.0 ? (for instance : performance, stability,
> critical fixes, support, etc.) (the answer may sound obvious, but I
> would like to have an official answer)
> - Is there any recommendation concerning storage any Lucene user should
> know (not benchmarks, but recommendations such as "better use physical
> HDD", "do not use NFS if possible", "if your cfs files are greater than
> XYZ, better use this kind of storage", "if you have more than XYZ
> searches per second, better..." etc)
> - Is there any recommandation concerning cfs file size ?
> - Is there a way to limit the size of cfs files ?
> - What is the impact on search performance if cfs file size is limited ?
> - How often should optimization occur ? (every day, week, month ?)
> - I saw that IndexWriter has methods such as setMaxFieldLength()
> setMergeFactor() setMaxBufferedDocs() setMaxMergeDocs() Can you briefly
> explain how these can affect performance ?
> - Is there any other recommandation "dummies" should be informed of, and
> every expert has to know ? For instance as a list of lucene patterns /
> anti patterns which may affect performance.
>
> If my questions are not precise enough, do not hesitate to ask for
> details. If you see an obvious problem do not hesitate to tell me.
>
> A big thank you in advance for your help,
>
> Best regards,
>
> Michel
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message