lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "drazen.nis" <d.niko...@youngculture.com>
Subject [SPATIAL] Spatial search runs forever
Date Tue, 16 Aug 2011 08:00:30 GMT
Hello,

Recently we have introduced distance searching/sorting into the existing
Lucene index, using the Spatial contrib for Lucene 2.9.4. There are 100K+
documents into the index where only 20K docs had latitude/longitude and
_tier_* fields. Spatial queries ran quite OK. 

After enriching the index with geo coordinates for most of the documents,
all queries using spatial distance filter + sorting started to run forever.
The details about the implementation are below. 
Do you have any idea what could cause this problem?


Environment Details
------------------
Lucene 2.9

Java 1.6.0_14
JAVA_OPTS=-Xms8000M -Xmx8000M -server -XX:-UseParallelOldGC
-XX:+PrintCommandLineFlags -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+DisableExplicitGC -Xloggc:gc.log

CentOS release 5.5 (Final)
8 cores server (physical machine)
18GB RAM
RAID5 HDD
(on this machine only Apache Web Server is running at the moment) 


Implementation Details
------------------
Implementation is based on the blog
http://develop.nydi.ch/2010/10/lucene-spatial-example/. During the execution
of spatial query the processor usage is raised to the max and runs like that
for hours. Thread dump shows next:

"searchers-thread-63" prio=10 tid=0x00000000488e4800 nid=0x3dab runnable
[0x0000000046789000]
   java.lang.Thread.State: RUNNABLE
	at java.util.HashMap.put(HashMap.java:374)
	at
org.apache.lucene.spatial.tier.LatLongDistanceFilter$1.match(LatLongDistanceFilter.java:97)
	at
org.apache.lucene.search.FilteredDocIdSet$1.match(FilteredDocIdSet.java:73)
	at
org.apache.lucene.search.FilteredDocIdSetIterator.advance(FilteredDocIdSetIterator.java:87)
	at org.apache.lucene.util.OpenBitSetDISI.inPlaceAnd(OpenBitSetDISI.java:66)
	at org.apache.lucene.misc.ChainedFilter.doChain(ChainedFilter.java:253)
	at org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:177)
	at org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:104)
	at
org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:277)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:258)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
	at org.apache.lucene.search.Searcher.search(Searcher.java:90)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.executeSearch(NewLuceneConnector.java:730)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.access$000(NewLuceneConnector.java:33)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector$2.run(NewLuceneConnector.java:884)
	at
javolution.context.ConcurrentContext$Default.executeAction(ConcurrentContext.java:358)
	at javolution.context.ConcurrentContext.execute(ConcurrentContext.java:271)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.newSearchByGroupsImpl(NewLuceneConnector.java:879)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.newSearchByGroupsImpl(NewLuceneConnector.java:782)
	at
com.yc.cyclone.isystem.search.grouping.TopicGroupingSearch$1.call(TopicGroupingSearch.java:667)
	at
com.yc.cyclone.isystem.search.grouping.TopicGroupingSearch$1.call(TopicGroupingSearch.java:662)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
	at
com.yc.cyclone.services.concurrency.WorkerThread.run(WorkerThread.java:49)

It's interesting, though, that even the processor was 100% used all the
time, other (non-spatial) searches and indexing tasks were processed by
Lucene without any problem and without noticable performance decrease.

We execute multiple queries in parallel (one search parameter differs in
those queries), which reuse the same filter, in this case this is:
new ChainedFilter( new Filter[] {nonSpatialQueryFilter,
distanceQueryBuilder.getFilter()}, ChainedFilter.AND);

For sorting is used:
new
DistanceFieldComparatorSource(distanceQueryBuilder..getDistanceFilter());


Here is one entry from the index (spatial fields):
_tier_10    _tier_11    _tier_12    _tier_13    _tier_14    _tier_15   
_tier_7    _tier_8    _tier_9   lat          lng
0.0          1.0001      2.0003      4.0006      9.00013     18.00027  0.0       
0.0          0.0       47.61242  8.54002   

Note that those fields are indexed as numeric fields, I've used
NumericUtils.prefixCodedToDouble(field.stringValue()) to print those data.

There are also documents which do not have those fields indexed.


Thank you.

Best Regards,
Drazen

--
View this message in context: http://lucene.472066.n3.nabble.com/SPATIAL-Spatial-search-runs-forever-tp3258018p3258018.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message