lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: lucene 2.9.0RC4 slower than 2.4.1?
Date Wed, 16 Sep 2009 17:22:38 GMT
Nevermind. I see advance wasn't around in 2.4. This is part of the
DocIdSetIterator changes.

Anyway - either these are just not comparable runs, or there is a major
bug (which seems unlikely).

Just to keep pointing out the obvious:

2.4 calls doc 195,000 times
2.9 calls docId 1.4 million times

That just doesn't jive.

Mark Miller wrote:
> Notice that while DisjunctionScorer.advance and
> DisjuntionScorer.advanceAfterCurrent appear to be called
> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
> called.
>
> Can someone explain that?
>
> Mark Miller wrote:
>   
>> Something is very odd about this if they both cover the same search and
>> the environ for both is identical. Even if one search was done twice,
>> and we divide the numbers for the new api by 2 - its still *very* odd.
>>
>> With 2.4, ScorerDocQueue.topDoc is called half a million times.
>> With 2.9, its called over 4 million times.
>>
>> Huh?
>>
>> Thomas Becker wrote:
>>   
>>     
>>> No it's only a single segment. But two calls. One doing a getHitsCount first
and
>>> the other doing the actual search. I'll paste both methods below if someone's
>>> interested.
>>>
>>> Will dig into lucene's sources and compare 2.4 search behaviour for my case with
>>> 2.9 tomorrow. It was about time to get more into lucene-core sources anyhow.
:)
>>>
>>> See you tomorrow guys and thanks a lot again! It's a pleasure.
>>>
>>> 	public int getHitsCount(String query, Filter filter) throws
>>> LuceneServiceException {
>>> 		log.debug("getHitsCount('{}, {}')", query, filter);
>>> 		if (StringUtils.isBlank(query)) {
>>> 			log.warn("getHitsCount: empty lucene query");
>>> 			return 0;
>>> 		}
>>> 		long startTimeMillis = System.currentTimeMillis();
>>> 		int count = 0;
>>>
>>> 		if (indexSearcher == null) {
>>> 			return 0;
>>> 		}
>>>
>>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>> 		Query q = null;
>>> 		try {
>>> 			q = createQuery(query);
>>> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
>>> 			indexSearcher.search(q, filter, tsdc);
>>> 			count = tsdc.getTotalHits();
>>> 			log.info("getHitsCount: count = {}",count);
>>> 		} catch (ParseException ex) {
>>> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
>>> 		} catch (IOException e) {
>>> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
>>> 		} finally {
>>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>>> 			if (durationMillis > slowQueryLimit) {
>>> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
>>> 			}
>>> 			log.debug("getHitsCount: query took {} ms", durationMillis);
>>> 		}
>>> 		return count;
>>> 	}
>>>
>>> 	public List<Document> search(String query, Filter filter, Sort sort, int
from,
>>> int size) throws LuceneServiceException {
>>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
>>> filter, sort, from, size });
>>> 		long startTimeMillis = System.currentTimeMillis();
>>>
>>> 		List<Document> docs = new ArrayList<Document>();
>>> 		if (indexSearcher == null) {
>>> 			return docs;
>>> 		}
>>> 		Query q = null;
>>> 		try {
>>> 			if (query == null) {
>>> 				log.warn("search: lucene query is null...");
>>> 				return docs;
>>> 			}
>>> 			q = createQuery(query);
>>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>> 			if (size < 0 || size > maxNumHits) {
>>> 				// set hard limit for numHits
>>> 				size = maxNumHits;
>>> 				if (log.isDebugEnabled())
>>> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
>>> {}", new Object[] { size, query, filter });
>>> 			}
>>> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
>>> true, false, false, true);
>>> 			indexSearcher.search(q, filter, collector);
>>> 			if(size > collector.getTotalHits())
>>> 				size = collector.getTotalHits();
>>> 			if (size > 100000)
>>> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
>>> {}", new Object[] { size, query, filter });
>>> 			TopDocs td = collector.topDocs(from, size);
>>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
>>> 			for (ScoreDoc scoreDoc : scoreDocs) {
>>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
>>> 			}
>>> 		} catch (ParseException e) {
>>> 			log.warn("search: ParseException: {}", e.getMessage());
>>> 			if (log.isDebugEnabled())
>>> 				log.warn("search: ParseException: ", e);
>>> 			return Collections.emptyList();
>>> 		} catch (IOException e) {
>>> 			log.warn("search: IOException: ", e);
>>> 			return Collections.emptyList();
>>> 		} finally {
>>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>>> 			if (durationMillis > slowQueryLimit) {
>>> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
>>> 						new Object[] { durationMillis, query,
>>> indexSearcher.getIndexReader().directory() });
>>> 			}
>>> 			log.debug("search: query took {} ms", durationMillis);
>>> 		}
>>> 		return docs;
>>> 	}
>>>
>>>
>>> Uwe Schindler wrote:
>>>   
>>>     
>>>       
>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>>>>
>>>>>> Have to verify that the last one is not by accident more than one
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> request.
>>>>>       
>>>>>         
>>>>>           
>>>>>> Will
>>>>>> do the run again and then post the required info.
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> The last figure shows, that IndexSearcher.searchWithFilter was called
>>>>> twice
>>>>> in contrast to the first figure, where IndexSearcher.search was called
>>>>> only
>>>>> once.
>>>>>       
>>>>>         
>>>>>           
>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was only
>>>> one search, you must have two segments and therefore no optimized index for
>>>> this to be correct?
>>>>
>>>> Uwe
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message