lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Funk <fu...@BATTELLE.ORG>
Subject Re: Index Sizes
Date Mon, 16 May 2005 20:48:22 GMT
Lucene is an excellent choice. 

If I were you I would not store the un-searched fields in the index.  
There's no clear benefit. Where you store the data depends on your needs 
- I use flat files for what I'm doing - as I need them just for 
display.  If you need the functionality of a relational database, then 
that's a perfectly acceptable solution as well.

I have roughly 100,000 records currently - where each "record" is a full 
text page from a book.  I use a dual processor Pentium 4.  I get large 
result sets (10,000 hits) back in under 1/10 of a second.  I've given no 
thought what-so-ever to keeping my code tight or controlled or efficient 
- and I'm still getting great results.

Richard Krenek wrote:

>Unfortunately our indexes will be performance sensitive. Is Lucene
>still a good choice?  What kind of hardware are you using?
>
>Also what are the performance implications for having the additional
>80 records in the index for just display purposes?
>
>Thanks,
>Richard Krenek
>
>
>
>On 5/13/05, Vince Taluskie <vgtaluskie@gmail.com> wrote:
>  
>
>>Yes, you'll be fine with 100 million, I've got a couple of non-performance
>>sensitive indexes that are more than double that (280M) with about 20
>>seachable fields as well.  We get results back in the 10-20 second range
>>which is fine for our end users.
>> 
>> Vince
>>
>>
>>On 5/13/05, Richard Krenek <richard.krenek@gmail.com> wrote:
>>    
>>
>>>Hypothetically I have 100 million records. Each record has 100+
>>>fields. Only 20 of those fields need to be searched on, the rest
>>>(including the 20) are just for display purposes.
>>>Would it be best to just add the 20 fields to the index and keep the 
>>>rest in a relational database? What affect does all that fluff data
>>>have on the index size and search speeds? Does it matter that some of
>>>the fluff data is repeated a lot. (certain fields might just contain
>>>state a person lives, the color of their hair, number of fingers, 
>>>etc).
>>>Our indexes are going to be very big, 100 million+ is not an
>>>exageration. Will Lucene handle this ok? I have created indexes in the
>>>8-30 million range, but never this big in the number of documents and
>>>also the number of fields.
>>>
>>>Thanks for any info you can provide.
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>    
>>
>>>To unsubscribe, e-mail:
>>>      
>>>
>>java-user-unsubscribe@lucene.apache.org
>>    
>>
>>>For additional commands, e-mail:
>>>      
>>>
>>java-user-help@lucene.apache.org
>>    
>>
>>>      
>>>
>>
>>-- 
>>
>>@work                                                   
>>@home 
>>
>>  vince.taluskie (at) cexp.com                     vince (at) taluskie.com
>>  Corporate Express; Technical Architect     Louisville, CO
>>  Phone:   303 664 2660                            
>>http://www.taluskie.com
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  
>

-- 
Dan Funk
Software Engineer

Information Technology Solutions
Battelle Charlottesville Operations
1000 Research Park Boulevard, Suite 105
Charlottesville, Virginia 22911

434.984.0951 x244
434.984.0947 (fax)
FunkD@Battelle.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message