lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Smiley <david.w.smi...@gmail.com>
Subject Re: Spatial Search based on the amount of docs, not the distance
Date Wed, 28 Jun 2017 18:24:55 GMT
Deniz didn't mention document-to-document distance sort but he/she didn't say it wasn't that
case either.

Any way, FYI at the Lucene level with LatLonPoint there is some sophisticated BKD search code
to efficiently return the top N distance ordered documents (where you supply N).  Although
as far as I recall, it also has no filtering mechanism, so if you have any other filters (keyword/time/whatever),
it wouldn't work.

I once did this feature on an RPT index for a client and I got the open-source permission
but I haven't gotten around to properly adding it to Solr.  I might approach it a bit differently
now.

~ David

> On Jun 22, 2017, at 8:34 PM, Tim Casey <tcasey@gmail.com> wrote:
> 
> deniz,
> 
> I was going to add something here.  The reason what you want is probably
> hard to do is because you are asking solr, which stores a document, to
> return documents using an attribute of document pairs.  As only a though
> exercise, if you stored record pairs as a single document, you could
> probably query it directly.  That is, if you have d1 and d2 and you are
> querying  around d1 and ordering by distance, then you could get this
> directly from a document representing a record pair.  I don't think this is
> practical, because it is an n^2 store.
> 
> Since the n^2 problem is there, people are going to suggest some heuristic
> which avoids this problem.  What Erick is suggesting is down this path.
> Query around a point and sort by distance taking the top K results.  The
> result is taking a linear slice of the n^2 distance attribute.
> 
> tim
> 
> 
> 
> On Wed, Jun 21, 2017 at 7:50 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
>> Would it serve to sort by distance? True, if you matched a zillion
>> documents within a 1km radius you'd still perform the distance calcs, but
>> the result would be a manageable number.
>> 
>> I have to ask "Why to you care?". Is this an efficiency question (i.e. you
>> want to keep Solr from having to do expensive work) or is it a question of
>> having to get hits at all? It's at least possible that the solution for one
>> is not the solution for the other.
>> 
>> Best,
>> Erick
>> 
>> On Wed, Jun 21, 2017 at 5:32 PM, deniz <denizdurmus87@gmail.com> wrote:
>> 
>>> it is for sure possible to use d value for limiting the distance,
>> however,
>>> it
>>> might not be very efficient, as some of the coords may not have any docs
>>> around for a large value of d... so it is hard to determine a default
>> value
>>> for d.
>>> 
>>> though it sounds like havinga default d and gradual increments on its
>> value
>>> might be a work around for top K results...
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -----
>>> Zeki ama calismiyor... Calissa yapar...
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Spatial-Search-based-on-the-amount-of-docs-not-the-distance-
>>> tp4342108p4342258.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 


Mime
View raw message