lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: Distance sort on a multi-value field
Date Fri, 23 Aug 2013 03:30:14 GMT

This is actually pretty far afield from my original subject, but it turns
out that I also had issues  with NRT and multi-field geospatial
performance in Solr 4, so I'll follow that up.


I've been testing and working with David's SOLR-5170 patch ever since he
posted it, and I pushed it into production with only some cosmetic changes
a few hours ago. 
I have a relatively low update and query rate for this particular query
type, (something like 2 updates/sec, 10 queries/sec) but a short
autosoftcommit time. (5 sec) Based on the data so far this patch looks
like it's brought my average response time down from 4 seconds to about
50ms.

Very nice!



On 8/20/13 7:37 PM, "David Smiley (@MITRE.org)" <DSMILEY@mitre.org> wrote:

>The distance sorting code in SOLR-2155 is roughly equivalent to the code
>that
>RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
>to double-check.  It's possible the behavior is slightly better in
>SOLR-2155
>because the cache (a Solr cache) contains normal hard-references whereas
>RPT
>has one based on weak references, which will linger longer.  But I think
>the
>likelihood of OOM is the same.
>
>Any way, the current best option is
>https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
>ago.
>
>~ David
>
>
>Billnbell wrote
>> We have been using 2155 for over 6 months in production with over 2M
>>hits
>> every 10 minutes. No OOM yet.
>> 
>> 2155 seems great, and would this issue be any worse than 2155?
>> 
>> 
>> 
>> On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes &lt;
>
>> jwartes@
>
>> &gt; wrote:
>> 
>>>
>>> Hm, "Give me all the stores that only have branches in this area" might
>>> be
>>> a plausible use case for farthest distance.
>>> That's essentially a "contains" question though, so maybe that's
>>>already
>>> supported? I guess it depends on how contains/intersects/etc handle
>>> multi-values. I feel like multi-value interaction really deserves its
>>>own
>>> section in the documentation.
>>>
>>>
>>> I'm aware of the memory issue, but it seems like if you want sort
>>> multi-valued points, it's either this or try to pull in the 2155 patch.
>>> In
>>> general I'd rather go with the thing that's being maintained.
>>>
>>>
>>> Thanks for the code pointer. You're right, that doesn't look like
>>> something I can easily use for more general aggregate scoring control.
>>>Ah
>>> well.
>>>
>>>
>>>
>>> On 8/14/13 12:35 PM, "Smiley, David W." &lt;
>
>> dsmiley@
>
>> &gt; wrote:
>>>
>>> >
>>> >
>>> >On 8/14/13 2:26 PM, "Jeff Wartes" &lt;
>
>> jwartes@
>
>> &gt; wrote:
>>> >
>>> >>
>>> >>I'm still pondering aggregate-type operations for scoring
>>>multi-valued
>>> >>fields (original thread: http://goo.gl/zOX53f ), and it occurred to
>>>me
>>> >>that distance-sort with SpatialRecursivePrefixTreeFieldType must be
>>> doing
>>> >>something like that.
>>> >
>>> >It isn't.
>>> >
>>> >>
>>> >>Somewhat surprisingly I don't see this in the documentation anywhere,
>>> but
>>> >>I presume the example query: (from:
>>> >>http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
>>> >>"q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}"
>>> >>
>>> >>assigns the distance/score based on the *closest* lat/long if the
>>> sfield
>>> >>is a multi-valued field.
>>> >
>>> >Yes it does.
>>> >
>>> >>
>>> >>That's a reasonable default, but it's a bit arbitrary. Can I sort
>>>based
>>> >>on
>>> >>the *furthest* lat/long in the document? Or the average distance?
>>> >>
>>> >>Anyone know more about how this works and could give me some
>>>pointers?
>>> >
>>> >I considered briefly supporting the farthest distance but dismissed it
>>> as
>>> >I saw no real use-case.  I didn't think of the average distance;
>>>that's
>>> >plausible.  Any way, you're best bet is to dig into the code.  The
>>> >relevant part is ShapeFieldCacheDistanceValueSource.
>>> >
>>> >FYI something to keep in mind:
>>> >https://issues.apache.org/jira/browse/LUCENE-4698
>>> >
>>> >~ David
>>> >
>>>
>>>
>> 
>> 
>> -- 
>> Bill Bell
>
>> billnbell@
>
>> cell 720-256-8076
>
>
>
>
>
>-----
> Author: 
>http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
>4084666p4085797.html
>Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message