spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] zhengruifeng commented on issue #26948: [SPARK-30120][ML] LSH approxNearestNeighbors should use BoundedPriorityQueue when numNearestNeighbors is small
Date Mon, 23 Dec 2019 04:53:18 GMT
zhengruifeng commented on issue #26948: [SPARK-30120][ML] LSH approxNearestNeighbors should
use BoundedPriorityQueue when numNearestNeighbors is small
URL: https://github.com/apache/spark/pull/26948#issuecomment-568354256
 
 
   The total logic is kinda similar to the procedures `recall` & `ranking` in many classfication
scenarios.
   recall: In the computation of `modelSubset`, more candidates than NN is selected. Even
if it is said before 3.0.0 that `Compute threshold to get **exact** k elements.` and in current
master that `Compute threshold to get around k elements.`
   Obtaining exact K elements are never impled, since method based on a threshold will select
at least K elements.
   
   ranking: Then to get the final top-K items, candidates filter by above `hashDist` will
be ranked by `keyDist`.
   
   I guess in the first part more candidates than NN are needed, no matter which selection
method is used.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message