lucene-lucene-net-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Shiralkar <nit...@coreobjects.com>
Subject RE: Lucene Scalability Options
Date Fri, 09 Jan 2009 08:47:37 GMT
Digy,

It will be difficult to create group of indexes because of the way we build and search the
index. We keep on adding new documents and also keep on updating existing documents quite
frequently. Also our searches need to be fired on the entire set.

We are not facing any search performance problems as of now, I just wanted to check if there
are any known performance or scalability issues after crossing 100 GB size. Another question
on same topic. I am not sure if 100 GB size of our index is genuine or it is due to some failures
which has resulted into redundant segments/files. I saw few TMP files which I have deleted.
But apart from that, I am not sure how to identify redundant or junk files in Lucene index
folder.

Following is the list of files which we have in lucene index folder:

\\LuceneIndexTest\_d8by.prx
\\LuceneIndexTest\_d8by.tii
\\LuceneIndexTest\_d8by.tis
\\LuceneIndexTest\_d8c9.fdt
\\LuceneIndexTest\_d8c9.fdx
\\LuceneIndexTest\_d8c9.fnm
\\LuceneIndexTest\_d8ca.fdt
\\LuceneIndexTest\_d8ca.fdx
\\LuceneIndexTest\_d8ca.fnm
\\LuceneIndexTest\_dl4h.fdt
\\LuceneIndexTest\_dl4h.fdx
\\LuceneIndexTest\_dl4h.fnm
\\LuceneIndexTest\_dl48.fdt
\\LuceneIndexTest\_dl48.fdx
\\LuceneIndexTest\_dl48.fnm
\\LuceneIndexTest\_dl48.frq
\\LuceneIndexTest\_dl48.prx
\\LuceneIndexTest\_dl48.tii
\\LuceneIndexTest\_dl48.tis
\\LuceneIndexTest\_fdbs.fdt
\\LuceneIndexTest\_fdbs.fdx
\\LuceneIndexTest\_fdbs.fnm
\\LuceneIndexTest\_fdbs.frq
\\LuceneIndexTest\_fdbs.prx
\\LuceneIndexTest\_fdbs.tii
\\LuceneIndexTest\_fdbs.tis
\\LuceneIndexTest\_fhz5.fdt
\\LuceneIndexTest\_fhz5.fdx
\\LuceneIndexTest\_fhz5.fnm
\\LuceneIndexTest\_fhz5.frq
\\LuceneIndexTest\_fhz5.prx
\\LuceneIndexTest\_fhz5.tii
\\LuceneIndexTest\_fhz5.tis
\\LuceneIndexTest\_fkla.fdt
\\LuceneIndexTest\_fkla.fdx
\\LuceneIndexTest\_fkla.fnm
\\LuceneIndexTest\_fkla.frq
\\LuceneIndexTest\_fkla.prx
\\LuceneIndexTest\_fkla.tii
\\LuceneIndexTest\_fkla.tis
\\LuceneIndexTest\_fmo5.fdt
\\LuceneIndexTest\_fmo5.fdx
\\LuceneIndexTest\_fmo5.fnm
\\LuceneIndexTest\_fmo5.frq
\\LuceneIndexTest\_fmo5.prx
\\LuceneIndexTest\_fmo5.tii
\\LuceneIndexTest\_fmo5.tis
\\LuceneIndexTest\_fmo6.fdt
\\LuceneIndexTest\_fmo6.fdx
\\LuceneIndexTest\_fmo6.fnm
\\LuceneIndexTest\_fmo6.frq
\\LuceneIndexTest\_fmo6.prx
\\LuceneIndexTest\_fmo6.tii
\\LuceneIndexTest\_fmo6.tis
\\LuceneIndexTest\_fmo7.fdt
\\LuceneIndexTest\_fmo7.fdx
\\LuceneIndexTest\_fmo7.fnm
\\LuceneIndexTest\_fmo7.frq
\\LuceneIndexTest\_fmo7.prx
\\LuceneIndexTest\_fmo7.tii
\\LuceneIndexTest\_fmo7.tis
\\LuceneIndexTest\_fmo9.fdt
\\LuceneIndexTest\_fmo9.fdx
\\LuceneIndexTest\_fmo9.fnm
\\LuceneIndexTest\_fmoa.fdt
\\LuceneIndexTest\_fmoa.fdx
\\LuceneIndexTest\_fmoa.fnm
\\LuceneIndexTest\_fmod.fdt
\\LuceneIndexTest\_fmod.fdx
\\LuceneIndexTest\_fmod.fnm
\\LuceneIndexTest\_fmoe.fdt
\\LuceneIndexTest\_fmoe.fdx
\\LuceneIndexTest\_fmoe.fnm
\\LuceneIndexTest\_fmof.fdt
\\LuceneIndexTest\_fmof.fdx
\\LuceneIndexTest\_fmof.fnm
\\LuceneIndexTest\_fmog.fdt
\\LuceneIndexTest\_fmog.fdx
\\LuceneIndexTest\_fmog.fnm
\\LuceneIndexTest\_fmoh.fdt
\\LuceneIndexTest\_fmoh.fdx
\\LuceneIndexTest\_fmoh.fnm
\\LuceneIndexTest\_foq9.fdt
\\LuceneIndexTest\_foq9.fdx
\\LuceneIndexTest\_foq9.fnm
\\LuceneIndexTest\_foq9.frq
\\LuceneIndexTest\_foq9.prx
\\LuceneIndexTest\_foq9.tii
\\LuceneIndexTest\_foq9.tis
\\LuceneIndexTest\_fq23.fdt
\\LuceneIndexTest\_fq23.fdx
\\LuceneIndexTest\_fq23.fnm
\\LuceneIndexTest\_fq23.frq
\\LuceneIndexTest\_fq23.prx
\\LuceneIndexTest\_fq23.tii
\\LuceneIndexTest\_fq23.tis
\\LuceneIndexTest\_hr8w.fdt
\\LuceneIndexTest\_hr8w.fdx
\\LuceneIndexTest\_hr8w.fnm
\\LuceneIndexTest\_hr8x.fdt
\\LuceneIndexTest\_hr8x.fdx
\\LuceneIndexTest\_hr8x.fnm
\\LuceneIndexTest\_k6jf.cfs
\\LuceneIndexTest\_kwhl.cfs
\\LuceneIndexTest\deletable
\\LuceneIndexTest\segments
\\LuceneIndexTest\_d8by.fdt
\\LuceneIndexTest\_d8by.fdx
\\LuceneIndexTest\_d8by.fnm
\\LuceneIndexTest\_d8by.frq

Any inputs on junk/redundant files in above list?



-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: Tuesday, December 30, 2008 2:37 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

Hi Nitin,

* I haven't heard about that 100GB limit but I tried Lucene.Net once with a
300GB index. The first searches (with a fresh IndexSearcher) took
~20sec(because of caching) but next searches performed quite well(varying
from ~50msec to 3sec).

* If you deal with such large indexes, it is better to group the indexes
according to some criteria(for ex., index of December, index of November
etc.) and not to use an index when it is not needed in the search. Of
course, keeping smaller indexes on multiple machines and making a parallel
search on them and then merging the results would be a good solution too,
but it would require more complex coding

You may also want to see some tricks about search speed optimizations (
http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed ) and the
project Solr ( http://lucene.apache.org/solr/features.html ).

* You can get the official releases of Lucene.Net from
https://svn.apache.org/repos/asf/incubator/lucene.net/site/download and the
current version from svn trunk
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/src/Lucene.
Net/



DIGY.







-----Original Message-----
From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
Sent: Saturday, December 27, 2008 6:41 AM
To: lucene-net-user@incubator.apache.org
Subject: Lucene Scalability Options

Hi All,

We are using Lucene.NET v2.0 library in our project. Our index has grown to
~80 GB in last one year. We expect our index to grow beyond 100 GB in next
six months. I have read somewhere long back about Lucene performance issues
after crossing 100 GB mark.


-          Is there any specific issues that we might run into after 100 GB?

-          Is there any known impact on search performance?

-          Do we have any scalability features that we can consider for
implementation? Clustering etc?

Any inputs would be valuable. Also I would like to know the latest stable
Lucene.NET release which we can migrate to, any download link would be
useful.


Thanks & regards,

Nitin Shiralkar


Mime
View raw message