lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: FW: Lucene indexing vs RDBMS insertion.
Date Mon, 18 Jun 2007 13:11:11 GMT
I'll certainly be interested to see whether you can hit that number, it's
pretty aggressive....

That said, you can also consider indexing in parallel and combining the
results. That is, you can have N machines running on N subsets of
the data. At the end, you can combine those indexes with
IndexWriter.addIndexes. I don't know if that helps in your situation,
but it's a possibility...

Best
Erick

On 6/18/07, Chew Yee Chuang <yeechuang@tecforte.com> wrote:
>
> Thanks for your suggestion Erick. I'm planning to test the indexing soon.
> For your information, currently the system is inserting into RDBMS which
> is
> around 1000 records per seconds. Thus, if lucene in place, I would expect
> it
> will index that much of documents per seconds as well (Our target is 3.6
> millions of document to be indexed in 1 hour). Beside of that, I'm
> planning
> to queue the record so lucene will have enough time to index it. Anyway,
> thanks for your suggestion and will come back to you once I tested the
> solution.
>
> Thanks,
> eChuang, Chew
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, June 15, 2007 11:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: FW: Lucene indexing vs RDBMS insertion.
>
> From my perspective, this is an irrelevant question. The real question
> is "is Lucene indexing fast enough for my application?". Which nobody
> can answer for you, you have to experiment.
>
> If you're building an index that's only updated every 6 months,
> Lucene is certainly "fast enough". If you're recreating the
> index every 6 seconds, it's a different question.
>
> So, I recommend that you create a test application that does
> nothing except read your source, do whatever parsing you
> need to do and does NOT index it at all. Record the time it
> takes.
>
> Then try the same thing WITH indexing and record the difference.
>
> Then, to get a sense of the dimension of the problem, try
> substituting inserting into the RDBMS instead of the Lucene
> index.
>
> Once you have numbers, you can make better decisions
> And people can give you better advice,  especially if you
> include more detail of your design.
>
> Best
> Erick
>
> On 6/15/07, Chew Yee Chuang <yeechuang@tecforte.com> wrote:
> >
> > Hi, I'm  a new user to Lucene, and heard that it is a powerful tool for
> > full
> > text search and I'm planning to use it in my project for data storage
> > purpose. Before the implementation, I could like to know whether there
> is
> > performance issue on Lucene indexing process. I have no doubt on the
> > retrieving and searching feature in Lucene but the indexing process. I
> > have
> > tested my current system to insert 1000 records in RDBMS storage it took
> > about 1 seconds. Thus, if I change my solution to Lucene, can Lucene
> > indexing process perform faster than RDBMS ? I have go through some of
> the
> > article talking about the "MergeFactor" and "MaxMergeDocs" parameter for
> > fine tune the indexing process, but no comparison between Lucene
> indexing
> > process and RDBMS insertion. Thus, hope someone who have experience in
> > Lucene can provide this information or some article that discuss between
> > Lucene and RDBMS.
> >
> >
> >
> > I really appreciate any help in this. Thanks
> >
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.472 / Virus Database: 269.8.16/849 - Release Date:
> 6/14/2007
> > 12:44 PM
> >
> >
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007
> 8:23 AM
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007
> 8:23 AM
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message