lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Order docIds to reduce disk seeks
Date Tue, 18 Nov 2014 17:40:20 GMT
Even if you sort all hits by docID it's likely too slow to visit every
single one and load the stored document ...

Try to find another way to solve your problem, making use of the inverted index?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <Stuart.Rose@pnnl.gov> wrote:
> Hi Vijay,
>
> ...sorting the documents you need to retrieve by docID order first...
>
> means sorting them by their 'document number' which is the value in the 'scoreDoc.doc'
field and is the value that the reader takes to 'retrieve' the document from the index. If
you write a comparator to sort the elements in the ScoreDoc[] by their doc field then that
will put them in 'docID order' and the reader will always be skipping forward to the next
doc which will probably reduce its seek time.
>
> Regards,
> Stuart
>
>
>
> -----Original Message-----
> From: Vijay B [mailto:vijay.nipuna@gmail.com]
> Sent: Monday, November 17, 2014 9:16 AM
> To: java-user@lucene.apache.org
> Subject: Order docIds to reduce disk seeks
>
> *Could someone point me how to order docIds as per **http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*
>
> *"Limit usage of stored fields and term vectors. Retrieving these from the index is quite
costly. Typically you should only retrieve these for the current "page" the user will see,
not for all documents in the full result set. For each document retrieved, Lucene must seek
to a different location in various files. Try sorting the documents you need to retrieve by
docID order first."*
>
> *To give some background:*
>
> *We are using plain vanilla LUCNE (version 4.2.1) for our **Our application.**We index
our documents using stored fields. We add two fields related to our documents: UUID: 9 digit
number represents internal id and
> doc_text: document text( 7k to 20K in size approx). In our search code, **we use boolean
Query to retrive by UUID  and fetch document text use if for other processing. We are noticing
slow response times with the searches. I understand that stored field retrieval are slower
and should be limited but this is mandatory for our app.*
>
>
> Current code:
>
> TopScoreDocCollector collector =
> TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);
>
> dirReader = DirectoryReader.open(FSDirectory.open(......))
> IndexSearcher indexSearcher = new IndexSearcher(dirReader); indexSearcher.search(query,
collector); ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
>
> for (ScoreDoc scoreDoc : scoreDocs) {
> Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text = luceneDoc.get("doc_text");
//these calls take lot of time
>
> //process text
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message