lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kent Gibson <kentgib...@yahoo.com>
Subject Re: raw hit count
Date Sun, 30 Nov 2003 17:55:11 GMT
Thanks a mil erik, I tried to make my own filter class
with a modified bit method as per below:
if (doc == interestingDoc)
{
bits.set(doc); // set bit for hit		
}

but this baby continues to always return 1!

so then I looked at indexreader, like you said and
ended up with something like this, its probably a
messy way of doing it, but I am happy.

Term term = new Term("body", "mercedes");
IndexReader ir = IndexReader.open(indexPath);
TermDocs termdocs = ir.termDocs(term);

int id = hits.id(i);

while (termdocs.next())
{
	if (termdocs.doc() == id)
	{
		System.out.println(
"Document number "
	+ termdocs.doc()
	+ " Freq: "
	+ termdocs.freq());
	}

}

it only works for single words, but I reckon I just
send split up the query and then make multiple scans. 

cheers

kent

--- Erik Hatcher <erik@ehatchersolutions.com> wrote:
> On Sunday, November 30, 2003, at 11:13  AM, Kent
> Gibson wrote:
> > as per Erik's idea I tried with the BitSet as
> follows:
> >
> > QueryFilter qf = new QueryFilter(query);
> > IndexReader ir = IndexReader.open(indexPath);
> > Searcher searcher2 = new IndexSearcher(ir);
> >
> > // get the bit set for the query
> > BitSet bits = qf.bits(ir);
> 
> I did not mean to imply for you to call the bits
> method in this manner. 
>   In fact, you should not call it - the
> IndexSearcher calls it under the 
> covers.  I was implying that you could write your
> own Filter subclass 
> that lit up a single-bit corresponding to the
> document you're 
> interested in.
> 
> > However I always get a result of 1, which I
> suppose is
> > has to do with this overlap thingy.
> 
> No, not related with respect to a filter - two
> different concepts.
> 
> > Is there not a simple way to just get some word
> > statistics out of a file?
> 
> Look at the Lucene index format (from Lucene's main
> web page).  Term 
> frequencies are part of the statistics gathered, of
> course.  You can 
> get at the values there using IndexReader.  This may
> be a lot 
> lower-level than you desire, but what Lucene stores
> is there for you.
> 
> 	Erik
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message