lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@syr.edu>
Subject Re: Benchmarking results
Date Fri, 07 Apr 2006 12:41:59 GMT


Marvin Humphrey wrote:
>
> On Apr 4, 2006, at 10:23 AM, Tatu Saloranta wrote:
>> So in this case, what would give more comparable results (assuming
>> you are interested in measuring likely server-side
>> usage scenario, which is usually what Lucene is used for)
>
> Actually, I think the benchmark results illustrate that everyone 
> should be at least mildly concerned about where the Term Vector data 
> gets stored.  KinoSearch only writes that data once.  Lucene, however, 
> has to read/write that data during each merge, and the more streams 
> you have, the more complex the merge.  It stands to reason that 
> storing term vector data with the stored fields data would speed up 
> the merge process.
>
This seems like a good idea., especially combined with the lazy 
loading/retrieve specified fields approach that we are proposing, so 
that we aren't getting the term vector every time we retrieve a 
document.   We could deprecate the IndexReader.getTermVector methods and 
move it to be accessed via the Field.  Not sure what the issues are 
completely, but it makes sense, since the TV data is not changing.


> Are there any other significant applications? 
Clustering.  Corpora analysis/browsing.  Most likely others

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message