lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lee carroll <lee.a.carr...@googlemail.com>
Subject Re: multiple document types in a core
Date Mon, 24 Oct 2011 10:05:15 GMT
Hi Erick,

Your right I think. On resources we gain a little bit on:
disk (a production implementation with live data would be 500 mb saved
in disk usage on each slave and master)
some reduction in network traffic on replication (we do a full
re-index every 24 hours at present)

On design we gain a little by being able to support searches at
various document levels (perform a destination search or hotel search
and return
documents at the "correct" level for the search with out the need to
perform field collapsing)

But in the cold light of day I don't think we gain huge amounts.
(leaving aside the index replication of a full index)

cheers lee c



On 23 October 2011 19:05, Erick Erickson <erickerickson@gmail.com> wrote:
> Yes, stored fields are placed verbatim for every doc. But I wonder
> at the utility of trying to share stored information. The stored
> info is put in certain files in the index, see:
> http://lucene.apache.org/java/3_0_2/fileformats.html#file-names
>
> and the files that store data are pretty much irrelevant to searching,
> the data in them is only referenced when assembling the document
> for return. So by adding this complexity you'll be saving a bit
> on file transfers when replicating your index, but not much else.
>
> Is it worth it? If so, why?
>
> Best
> Erick
>
> On Mon, Oct 17, 2011 at 11:07 AM, lee carroll
> <lee.a.carroll@googlemail.com> wrote:
>> Just as a follow up
>>
>> it looks like stored fields are stored verbatim for every doc.
>>
>> hotel index and store dest attributes
>> index size: 131M
>> number of records 49147
>>
>> hotel index only dest attributes
>>
>> index size: 111m
>> number of records 49147
>>
>>
>> ~400 chars(bytes) of destination data * 49147 (number of hotel docs) = ~19m
>>
>> basically everything is being stored
>>
>> No difference in time to index (very rough and not scientific :-) )
>>
>> So it does seem an ok strategy to denormalise docs with index fields
>> but normalise with stored fields ?
>> Or have i missed some problems with this ?
>>
>> cheers lee c
>>
>>
>>
>> On 16 October 2011 11:54, lee carroll <lee.a.carroll@googlemail.com> wrote:
>>> Hi Chris thanks for the response
>>>
>>>> It's an inverted index, so *tems* exist once (per segment) and those terms
>>>> "point" to the documents -- so having the same terms (in the same fields)
>>>> for multiple types of documents in one index is going to take up less
>>>> overall space then having distinct collections for each type of document.
>>>
>>> I'm not asking about the indexed terms but rather the stored values.
>>> By having two doc types are we gaining anything by "storing"
>>> attributes only for that doc type
>>>
>>> cheers lee c
>>>
>>
>

Mime
View raw message