lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: The best strategy to "How store multiple fields of same document"
Date Thu, 31 Jul 2008 14:50:03 GMT
I'd go with option 1 unless and until you could demonstrate performance
problems. Speaking of which, you'd get a more informed answer if you
provided a bit more data, like how many fields are we talking, how many
documents, etc. If you're indexing 10,000 documents, go with the simplest.
If you're indexing 1,000,000,000 documents, more thought is required <G>..
Do you expect 3 fields/doc or 30,000 fields/doc?

But the reason I'd go with <1> is that your second option has some issues.
1> how to tokenize? You'll probably have to write a custom one or risk
    getting tokens "name" "value" rather than "name@value".
2> Forming queries is, I believe, equally complex in both cases, so
    choose the conceptually simplest one. Let's say you have
    to search on foo1:val1  and foo2:val2. In the first case this is
    simple +foo1:val1 +foo2:val2. For your second case, you get
    +bigfield:foo1@val1 + bigfield:foo2@val2. There's not much
    difference between the two.
3> Back to my initial comment about resource usage: we don't
    have enough data to answer whether it makes any difference.
    But even if we did, you'd find the response a variation of
    "you'll have to try it and see" since there are so many
    variables.

But I'll repeat that I always go with the simplest approach unless and
until I'm certain there's a problem...

Best
Erick

On Thu, Jul 31, 2008 at 10:36 AM, Sergey Kabashnyuk <ksmmlist@gmail.com>wrote:

> The best strategy.
>
> Hello.
> I want to ask you opinion about to "How
> store multiple fields of same document".
>
> I see now two possibility's.
> 1. Multiple fields in document
> 2. One filed: for example named PROPERTIES, with multiple instances.
>  And values combined with name for example "name@value"
>
> What choice the best for search speed and resource usage?
>
> Thanks.
>
> Sergey Kabashnyuk
> eXo Platform SAS
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message