lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Kor <dave...@yahoo.com>
Subject Re: TermVector retrieval implementation questions
Date Tue, 16 Oct 2001 01:33:36 GMT

--- Dmitry Serebrennikov <dmitrys@earthlink.net>
wrote:
> >
> >
> >That's something new. Unindexed fields such as
> keyword
> >fields won't have term ids? I hope you can clarify
> >further...
> >
> I believe keywords are indexed, just not tokenized.
> So the entire field 
> is treated as a single term.
> This is typically used for storing fields like
> "price" or "id" or 
> what-not that is more of a typical database-style
> one field - one value 
> situation.

Okay... I seems that I have forgotten my lucene
terminology, I kept thinking indexed == tokenized.
That resulted in me saying that unindexed fields would
also be in tvs when I was actually referring to
untokenized fields. My apologies. 


> >Hmm.. will there be a way we can convert/add
> >vectorization to the old segments? The users may
> want
> >some kind of migration path to the new format other
> >than reindexing the entire index. 
> >
> Yes and no....

Like Doug, my schedule is a little too tight to
properly think it through right now so I'll reply
again on friday as promised earlier.


> Yes, I know. Me too. Interestingly enough, indexing
> seems to be 
> completely IO-bound. I was watching CPU monitor last
> night as I was 
> running some simple indexing and CPU never hit
> higher then 5% 
> utilization. I didn't have a chance to compare this
> to a previous 
> version yet. Does anyone know if this is expected
> behavior or is it 
> because I managed to break something?

I once did a test of generating 5 million documents of
random lengths with randomly choosen words from a set
of 25000 elements (pre-loaded into memory). 

Indexing took about 5+ hours and I too noticed that
most of the time, CPU usage was extremely low. Disk
activity pretty much took up the most time, especially
during segment merging which seem to happen
periodically at roughly one minute intervals. 




__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

Mime
View raw message