lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Re: Proposal: extracting term-level stats from query process
Date Wed, 17 Mar 2004 08:26:38 GMT
Doug,
To save any duplicated effort on your part: I've started work on the RAMDirectory alternative
you suggested last week:
>> It would be interesting to write an in-memory version of IndexReader and IndexWriter

>>that don't serialize anything to bytes. 
My current implementation is benchmarking as twice as fast at indexing  than RAMDirectory
but is
slower at querying - I'm working on this. Fortunately querying is relatively much faster than
indexing so, overall, it is still 
proving quicker at indexing and querying than using a RAMDirectory to perform one-time analysis
of search results.

It would be useful if the Lucene Term class could be made to implement the "Comparable" interface
- I think this could be added without
breaking anything. I've currently had to write my own "ComparableTerm" class simply to put
terms into treemaps.

The design rationale is currently, no thread safety, no ability to merge with other indexes
etc. A pure throw-away index typically used once in a single
query thread to analyse search results. To support this scenario it also offers some new methods
of use in refining searches eg things like: 
  String getMostCommonUnstemmedForm(Term t)
  float getRelativeSignificance(Term t, IndexReader corpusReader)

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message