lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ignacio Vera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8452) BKD-based shape indexing benchmarks
Date Tue, 14 Aug 2018 09:45:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579547#comment-16579547
] 

Ignacio Vera commented on LUCENE-8452:
--------------------------------------

{quote}we could make them independent, eg. by indexing (x1, y1, x2 - x1, y2 - y1, x3 - x1,
y3 - y1) instead of (x1, y1, x2, y2, x3, y3)
{quote}
+1, it looks promising...

 
{quote}already have a WKT parser for {{LatLonShape}} - lines and polygons - that I can commit
to luceneutil separately if interested
{quote}
Is there any reason not to add this utility to Lucene? It looks to me it would be very useful.
(Note: We currently have the class {{SimpleGeoJSONPolygonParser}} which it seems to me it
was added to support the GeoBenchmark for points).

 
{quote}I can extract a smaller set (e.g., 60M shapes to complement the 60M points in geobench)
{quote}
 

Awesome, but note that with 60M shapes we might not be able to compare the performance with
spatial trees because it probably takes too long to index the data. 

 

> BKD-based shape indexing benchmarks
> -----------------------------------
>
>                 Key: LUCENE-8452
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8452
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/sandbox
>            Reporter: Ignacio Vera
>            Priority: Major
>         Attachments: BKDperf.pdf, Lake.png, Park.png, River.png
>
>
> Initial benchmarking of the new BKD-based shape indexing suggest that searches can be
somewhat under-performing.   I open this ticket to share the findings and to open a discussion
how to speed up the solution.
>  
> The first benchmark is done by using the current benchmark in luceneutils for indexing
points and search by bounding box. We would expect {{LatLonShape}} to be slower that {{LatLonPoint}} but
still having a good performance. The results of running such benchmark in my computer looks
like:
>  
> LatLonPoint:
> 89.717239531 sec to index
> INDEX SIZE: 0.5087761553004384 GB
> READER MB: 0.6098232269287109
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 72.91056132596746
> BEST QPS: 74.19031323419311 
>  
> LatLonShape:
> 89.388678805 sec to index
> INDEX SIZE: 1.3028179928660393 GB
> READER MB: 0.8827085494995117
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 1.0053836784184809
> BEST QPS: 1.0230305276205143
>  
> A second benchmark has been performed indexing around 10 million 4-side polygons and
around 3 million points. Searches are performed using bounding boxes. The results are compared
with spatial trees alternatives. Spatial trees use a composite strategy, precision=0.001 degrees
and distErrPct=0.25:
>  
> s2 (Geo3d):
> 1191.732124301 sec to index part 0
> INDEX SIZE: 3.2086284114047885 GB
> READER MB: 19.453557014465332
> maxDoc=12949519
> totHits=705758537
> BEST M hits/sec: 13.311369588840462
> BEST QPS: 4.243743434150063
>  
> quad (JTS):
> 3252.62925159 sec to index part 0
> INDEX SIZE: 4.5238002222031355 GB
> READER MB: 41.15725612640381
> maxDoc=12949519
> totHits=705758357
> BEST M hits/sec: 35.54591930673003
> BEST QPS: 11.332252412866938
>  
> LatLonShape:
> 30.32712009 sec to index part 0
> INDEX SIZE: 0.5627057952806354 GB
> READER MB: 0.29498958587646484
> maxDoc=12949519
> totHits=705758228
> BEST M hits/sec: 3.4130465326433357
> BEST QPS: 1.0880999177593018
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message