lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fraschetti <frasche...@gmail.com>
Subject Re: lucene docs in bulk read?
Date Tue, 01 Feb 2005 22:12:54 GMT
Definitely a good idea on the one line idea... that could possibly
save a good amount of time. I'm using .stringValue ... in reality, I
hadn't ever even considered readerValue ... is there a strong
performance difference between the two? or is it simply on the
functionality side?

The basic post processing is a grouping of results... because of the
time and space issues of my indexing process I am unable efficiently
go back and reindex a document if I have found a duplicate (my search
engine deals with multiple documents over time) .. so my post
processing groups results in the top 5000 hits which are the same,
except over different dates... But I need to grab the minimal data in
order to do this... the URL of the original page, the date of the doc,
etc... so that I can use only 1 doc, but if I find a duplicate, I can
simple add the new date to already existing doc. I am only reading a
few fields, but on a large scale of many documents, it hurts my timing
quite a bit.

-Chris


On Tue, 1 Feb 2005 21:33:13 +0100, Kelvin Tan <kelvin-lists@relevanz.com> wrote:
> Please see inline.
> 
> On Tue, 1 Feb 2005 09:27:26 -0800, Chris Fraschetti wrote:
> > Well all my fields are strings when I index them. They're all very
> > short strings, dates, hashes, etc. The largest field has a cap of
> > 256 chars and there is only one of them, the rest are all fairly
> > small.
> >
> > Can you explain what you meant by 'string or reader' ?
> 
> Sorry, I meant to ask if you're using String fields (field.stringValue()) or reader fields
(field.readerValue()).
> 
> Can you elaborate on the post-processing you need to do? Have you thought about concatenating
the fields you require into a single non-indexed field (Field.UnIndexed) for simple retrieval?
It'll increase the size of your index, but should be faster to retrieve them all at one go.
> 
> Kelvin
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


-- 
___________________________________________________
Chris Fraschetti
e fraschetti@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message