lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <ha.p...@arvatosystems.com>
Subject RE: multivalued fields or multiple fields?
Date Fri, 10 Apr 2015 21:19:01 GMT
We are also evaluating an option in which each document has 600K fields (scalable tested with
1M fields).  Index/reindex/query performance is acceptable (~2.5h to index 130K docs using
1 machine, query time <20ms), however atomic update took lots of memory and time. Hope
can help.

-Ha Pham

-----Original Message-----
From: david.w.smiley@gmail.com [mailto:david.w.smiley@gmail.com] 
Sent: Friday, April 10, 2015 10:34 AM
To: solr-user@lucene.apache.org; Marcelo Valle
Subject: Re: multivalued fields or multiple fields?

I don't at all thing a massive number of fields is helpful here.

I added an answer to stack-overflow since you started this question/conversation there.  I'll
paste it here for those that don't want to follow the link:

Use highlighting. @Jokin first mentioned it and I feel this is the best
> answer without hacking on Solr. Try either the PostingsHighlighter or 
> the FastVectorHighlighter, not the default/standard highlighter. 
> Unfortunately both of them internally execute a wildcard query against 
> all UIDS in this field. FVH has the *opportunity* internally to be 
> smarter about that but it's not implemented that way.
>


note: if it's within scope to write a little Java to add to Solr, the ideal
> answer would be to add term vectors (just the terms data in the 
> term-vector, no offsets/positions) and then write a "DocTransformer" 
> to grab the term vector terms; seek to the prefix, then iterate on 
> those that have that prefix. Pretty darned fast.


~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley

On Fri, Apr 10, 2015 at 5:46 AM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemilita@bloomberg.net>
wrote:

> I have a model where I store a field called `uuid_scores` in a 
> document and save values with the following format:
> 123_456 - where 123 is uuid and 456 is the score.
>
> To retrieve scores for uuid 123, I search all documents where 
> uuid_scores field starts with 123_ and then I read only the values 
> that start with 123_ in the answer.
>
> The problem is I can have about 100k values in this multi valued 
> field, so it can be hard to retrieve just what I want, as stated in 
> http://stackoverflow.com/questions/29535197/how-to-filter-values-retur
> ned-on-a-multivalued-field-in-solr
>
> Someone suggested me using 1 field per uuid. So instead of just 1 
> multi valued field, I would have about 100k fields, with names like 
> score_123 (and value [456]).
>
> Is there a problem in having so many fields in Solr? What are the 
> advantages / disadvantages if:
>
> *I have to update this document later adding one more value *I need 
> fast inserts when inserting the document in the first time *I need to 
> query everything related to 1 uuid, so what would be the faster option 
> to search?
>
> Thanks
> -Marcelo
Mime
View raw message