lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: TermVector support - first release
Date Sat, 20 Oct 2001 02:32:57 GMT
Dmitry,

Wow!  This looks great!

I was preparing a response to your questions of last weekend, but it seems
like you figured out a lot of it on your own.  I've attached that response
anyway, in case you're still interested.

Once we get 1.2 out the door I'd like to make you a committer (providing
others approve) so that you can commit these changes yourself.  I'd also
still like to review them a bit more, but some of that can  happen after
they are committed.

My biggest question is about the field-orientation of this.  I had imagined
this to be more document oriented, that there would be a single
TermFreqVector per document, rather than one per field.  That would simplify
things a bit, and make it a bit more efficient.  Of course, one could always
construct the full-document freq vector by combining the field vectors, but
the question is, do folks need the field-specific vectors?

Overall, Bravo!

Doug

> -----Original Message-----
> From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net]
> Sent: Thursday, October 18, 2001 1:56 PM
> To: lucene-dev@jakarta.apache.org
> Subject: TermVector support - first release
> 
> 
> Greetings, everyone!
> 
> I have the first version of the term vector support ready to go. I'm 
> attaching a file with release notes that explain breifly what the new 
> capabilities are and what there changes were to make the 
> happen. There 
> are some limitations that are also described. The zip file 
> contains new 
> files, to be added. The txt file is the result of cvs diff -u against 
> the current CVS repository.
> 
> I am really interested in feedback. First, do the APIs work for your 
> needs? Also, does everything work? What kind of performance you are 
> seeing? Are there things that could be done better 
> (especially in terms 
> of file structures and reading of those files, I think this 
> is where the 
> next layer of optimizations should come from).
> 
> In terms of riskiness, these changes are pretty risky, so I 
> don't think 
> they should go into the 1.2. But I've been using them for the 
> past few 
> days and I didn't have to touch the files at all, so I think they are 
> pretty stable.
> 
> Have fun, everyone.
> Dmitry.
> 
> 


Mime
View raw message