lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chew Yee Chuang" <yeechu...@tecforte.com>
Subject RE: FW: Lucene indexing vs RDBMS insertion.
Date Tue, 19 Jun 2007 02:20:58 GMT
Thanks for the sharing and suggestion. 
Yes Chris, the index is to be partitioned by date time, and old index will
not be access so frequent. 

I also did consider indexing in parallel to different index as well Erick.
But I can only put all index in ONE machine and there is only ONE machine to
process the job (both searching and indexing). 

I haven’t try out the addIndexes to combine indexes but I did tried out
MultiSearcher for 3 millions of documents in 3 separate indexes and it does
not have much different compare to a search in 1 index. Could you mind to
share your experience in addIndexes, how is the performance for that ? In my
situation,  the indexes may be use for searching at the same time. Another
worry for indexing is optimization process, I have tried it out and it take
quite some time to optimize my index, e.g indexing for 5000 is about
10seconds (MergeFactor = 1000, and recreate IndexWriter every 5000
documents), but to optimize, it will take around ONE minute. So do you guys
have any suggestion on optimization process as well ? when should I run the
optimization process ? and is it a lot of different with searching in a
optimized index compare to a un-optimize ? 

------
eChuang, Chew
 
-----Original Message-----
From: Chris Lu [mailto:chris.lu@gmail.com] 
Sent: Tuesday, June 19, 2007 2:19 AM
To: java-user@lucene.apache.org
Subject: Re: FW: Lucene indexing vs RDBMS insertion.

Definitely very aggressive.

Currently my experience is that, together with database access,
DBSight can do 3 million in 2 hours, with Pentium D 3.4Hz. Seems you
definitely need some good hardware, and a fast hard drive for this. I
feel the hard drive is actually the bottleneck for large indexes.
Partitioning your data and pruning the old data should be also
considered.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
inutes


On 6/18/07, Chew Yee Chuang <yeechuang@tecforte.com> wrote:
> Thanks for your suggestion Erick. I'm planning to test the indexing soon.
> For your information, currently the system is inserting into RDBMS which
is
> around 1000 records per seconds. Thus, if lucene in place, I would expect
it
> will index that much of documents per seconds as well (Our target is 3.6
> millions of document to be indexed in 1 hour). Beside of that, I'm
planning
> to queue the record so lucene will have enough time to index it. Anyway,
> thanks for your suggestion and will come back to you once I tested the
> solution.
>
> Thanks,
> eChuang, Chew
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, June 15, 2007 11:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: FW: Lucene indexing vs RDBMS insertion.
>
> From my perspective, this is an irrelevant question. The real question
> is "is Lucene indexing fast enough for my application?". Which nobody
> can answer for you, you have to experiment.
>
> If you're building an index that's only updated every 6 months,
> Lucene is certainly "fast enough". If you're recreating the
> index every 6 seconds, it's a different question.
>
> So, I recommend that you create a test application that does
> nothing except read your source, do whatever parsing you
> need to do and does NOT index it at all. Record the time it
> takes.
>
> Then try the same thing WITH indexing and record the difference.
>
> Then, to get a sense of the dimension of the problem, try
> substituting inserting into the RDBMS instead of the Lucene
> index.
>
> Once you have numbers, you can make better decisions
> And people can give you better advice,  especially if you
> include more detail of your design.
>
> Best
> Erick
>
> On 6/15/07, Chew Yee Chuang <yeechuang@tecforte.com> wrote:
> >
> > Hi, I'm  a new user to Lucene, and heard that it is a powerful tool for
> > full
> > text search and I'm planning to use it in my project for data storage
> > purpose. Before the implementation, I could like to know whether there
is
> > performance issue on Lucene indexing process. I have no doubt on the
> > retrieving and searching feature in Lucene but the indexing process. I
> > have
> > tested my current system to insert 1000 records in RDBMS storage it took
> > about 1 seconds. Thus, if I change my solution to Lucene, can Lucene
> > indexing process perform faster than RDBMS ? I have go through some of
the
> > article talking about the "MergeFactor" and "MaxMergeDocs" parameter for
> > fine tune the indexing process, but no comparison between Lucene
indexing
> > process and RDBMS insertion. Thus, hope someone who have experience in
> > Lucene can provide this information or some article that discuss between
> > Lucene and RDBMS.
> >
> >
> >
> > I really appreciate any help in this. Thanks
> >
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.472 / Virus Database: 269.8.16/849 - Release Date:
6/14/2007
> > 12:44 PM
> >
> >
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007
> 8:23 AM
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007
> 8:23 AM
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007
8:23 AM
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.472 / Virus Database: 269.9.0/853 - Release Date: 6/18/2007
3:02 PM
 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message