lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Calabrese <m...@jasoncalabrese.com>
Subject Re: Getting count of documents matching a query?
Date Fri, 07 Apr 2006 18:06:57 GMT
I just wrote some simple code to test this.

For my test I ran the test with 3 queries:
- A 3 term boolean
- A single term query with over 5000 hits
- A single term query with 0 hits

For each query I ran the ran 4 tests of 10,000 searches:
1) using hits.length to get the counts and the standard similarity
2) using hits.length to get the counts and a custom similarity
3) using HitCollector to get the counts and the standard similarity
4) using HitCollector to get the counts and a custom similarity

The custom similarity returns 0 for all methods.  
The results are kind of surprising. It doesn't look like the speed up is 
enough to make the change to our application.

Here are the results, the test class is also attached:

time (mills) 14095, useHC=false, standardSimilarity=true, count=47, 
query=abstract_recent:(genetically modified organism)
time (mills) 15406, useHC=false, standardSimilarity=false, count=0, 
query=abstract_recent:(genetically modified organism)
time (mills) 13768, useHC=true, standardSimilarity=true, count=47, 
query=abstract_recent:(genetically modified organism)
time (mills) 14404, useHC=true, standardSimilarity=false, count=47, 
query=abstract_recent:(genetically modified organism)


time (mills) 6790, useHC=false, standardSimilarity=true, count=5776, 
query=lname:smith
time (mills) 4901, useHC=false, standardSimilarity=false, count=0, 
query=lname:smith
time (mills) 5209, useHC=true, standardSimilarity=true, count=5776, 
query=lname:smith
time (mills) 5578, useHC=true, standardSimilarity=false, count=5776, 
query=lname:smith


time (mills) 47, useHC=false, standardSimilarity=true, count=0, 
query=lname:dfdsalkfjdsalkjflsa
time (mills) 37, useHC=false, standardSimilarity=false, count=0, 
query=lname:dfdsalkfjdsalkjflsa
time (mills) 41, useHC=true, standardSimilarity=true, count=0, 
query=lname:dfdsalkfjdsalkjflsa
time (mills) 198, useHC=true, standardSimilarity=false, count=0, 
query=lname:dfdsalkfjdsalkjflsa




On Thursday 06 April 2006 15:19, Chris Hostetter wrote:
> : I need the count, and don't need the docs at this point. If I had a
> : simple query, (e.g. "book") I can use docFreq(), and it's lightning
> : fast. If I just run it as a query it's much slower. I'm just
> : wondering if I did a custom scorer / similarity / hitcollector, how
> : much faster than a query could I get it? Or is there a better way?
>
> A custom HitCollector would be the first big win, something like this
> would probably work...
>
>    final int[] count = new int[1]
>    searcher.search(query, new HitCollector() {
>        public void collect(int doc, float score) {
>           count[0]++;
>        }
>     });
>     return count[0]
>
> otherways you might be able to shave time would be...
>
>   * if your query can be represented as in simple set logic logic (you
>     don't seem to be concerned with score) then implimenting it as a
>     Filter may be faster becuase it won't do any score calculation, just a
>     simple match/no-match (which is what you seem to want) ... but it will
>     definitely take up more memory then a query
>
>   * if you customize your similarity so that every function returns 0 or 1
>     you might shave a little bit of time off by skipping some of the math
>     equations ... but i really doubt it.
>
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message