lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <GSIng...@syr.edu>
Subject Re: Adding to the termFreqVector
Date Wed, 01 Jun 2005 20:15:56 GMT
I don't think you need to parse the toString, you have the
TermFreqVector object which lets you access the appropriate pieces of
information (string, freq).  You could then turn around and delete/index
the new document based on the vector with the increments.  I don't know
whether it would scale or not, but at some point you have to do the
work, right?  Lucene is quite fast, you might be surprised what
"scales".

Where is the information coming from that allows you to increment "red"
below?  Is the information known at indexing?  Perhaps your token
filters could just add the value  onto the token stream when you index. 
Of course, this _may_ screw up position information, but you haven't
said anything about needing that, so I will assume you don't

-Grant


>>> ryan@skow.org 5/31/2005 6:02:53 PM >>>

Adding new terms and re-indexing the document is the desired behavior.

One (non-scalable) solution would be to parse the toString of the
termFreqVector (freq {myTermField: red/2, green/1, blue/1}) and create
a
new string representation of the expanded terms:  (red red green blue)

This obviously isn't a good solution.  Finding a way to simply do a
document.addTerm("red"); and then re-indexing would be ideal.



> Is your intent to persist the changed vector somehow or just use it
in
> your application for the immediate search?
>
> TermFreqVector is an interface, so if you aren't persisting, I would
> write a wrapper class around the one that is returned by Lucene that
has
> add/set methods on it for manipulating the underlying vector and
pass
> that around in your application.  Other option, is to get the source
and
> modify the TermFreqVector for your needs.
>
> Persistance is a bit harder, but would probably involve manipulating
> the document and then re-indexing it so that it's new vector has the
> updated frequencies by adding some dummy terms onto the document.
>
> Is that what you are looking for?
>
>>>> ryan@skow.org 5/30/2005 12:37:54 PM >>>
>
> How would one go about adding additional terms to a field which is
not
>
> stored literally, but instead has a termFreqVector?  For example:
>
> If DocumentA was indexed originally with:
>     myTermField: red green blue
>
> termFreqVector would look like:
>    freq {myTermField: red/1, green/1, blue/1}
>
> Now, I'd like to add some more terms (red, yellow) and desire the
> termFreqVector to look like this:
>    freq {myTermField: red/2, green/1, blue/1, yellow/1}
>
> It would seem like there would be a covenant way of accomplishing
this,
>
> but I must be missing something.
>
> Any advice would be greatly appreciated!
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
> For additional commands, e-mail: java-user-help@lucene.apache.org 
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
> For additional commands, e-mail: java-user-help@lucene.apache.org 
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message