lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: The best strategy to "How store multiple fields of same document"
Date Thu, 31 Jul 2008 19:43:48 GMT
Haven't a clue <G>.

Erick

On Thu, Jul 31, 2008 at 11:29 AM, Sergey Kabashnyuk <ksmmlist@gmail.com>wrote:

> Thank you Erick.
>
> I'm talking about more then 10,000 documents and 95% less then 10 fields.
> Maximum number of fields per document is unlimited.
> But in practice it's no more the 20.
>
>
> I'm interesting: does Lucene have any internal optimization,
> which depend of the fields count or fields size, as database do?
> I mean to determinate position of row X in index:
>
> positionX = sum(fieldsize[1]+...fieldsize[i])*(X-1)
>
>
> Sergey Kabashnyuk
> eXo Platform SAS
>
>
>  I'd go with option 1 unless and until you could demonstrate performance
>> problems. Speaking of which, you'd get a more informed answer if you
>> provided a bit more data, like how many fields are we talking, how many
>> documents, etc. If you're indexing 10,000 documents, go with the simplest.
>> If you're indexing 1,000,000,000 documents, more thought is required <G>..
>> Do you expect 3 fields/doc or 30,000 fields/doc?
>>
>> But the reason I'd go with <1> is that your second option has some issues.
>> 1> how to tokenize? You'll probably have to write a custom one or risk
>>    getting tokens "name" "value" rather than "name@value".
>> 2> Forming queries is, I believe, equally complex in both cases, so
>>    choose the conceptually simplest one. Let's say you have
>>    to search on foo1:val1  and foo2:val2. In the first case this is
>>    simple +foo1:val1 +foo2:val2. For your second case, you get
>>    +bigfield:foo1@val1 + bigfield:foo2@val2. There's not much
>>    difference between the two.
>> 3> Back to my initial comment about resource usage: we don't
>>    have enough data to answer whether it makes any difference.
>>    But even if we did, you'd find the response a variation of
>>    "you'll have to try it and see" since there are so many
>>    variables.
>>
>> But I'll repeat that I always go with the simplest approach unless and
>> until I'm certain there's a problem...
>>
>> Best
>> Erick
>>
>> On Thu, Jul 31, 2008 at 10:36 AM, Sergey Kabashnyuk <ksmmlist@gmail.com
>> >wrote:
>>
>>  The best strategy.
>>>
>>> Hello.
>>> I want to ask you opinion about to "How
>>> store multiple fields of same document".
>>>
>>> I see now two possibility's.
>>> 1. Multiple fields in document
>>> 2. One filed: for example named PROPERTIES, with multiple instances.
>>>  And values combined with name for example "name@value"
>>>
>>> What choice the best for search speed and resource usage?
>>>
>>> Thanks.
>>>
>>> Sergey Kabashnyuk
>>> eXo Platform SAS
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message