lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter M Cipollone" <...@bihvhar.com>
Subject Re: Getting a field value from a large indexed document is slow.
Date Fri, 14 May 2004 15:30:13 GMT
Paul,

It might be worth your while to store the file itself outside lucene, and
only store the filename in the stored data.  This is generally how
relational databases deal with LOBs, and will work with Lucene, too.  You
will also save yourself hours when it comes time to merge indices or
optimize, since those operations are, in effect, large copy operations.

Regards,
Pete

----- Original Message ----- 
From: "Paul Williams" <paul@valinf.com>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Friday, May 14, 2004 11:22 AM
Subject: Getting a field value from a large indexed document is slow.


> Hi,
>
> I hope someone can help!
> I am using Lucene to make a searching repository of electronic documents.
> (MS Office, PDF's etc.). Some of these document can contain a large amount
> of text (about 500K of text in some cases) which is indexed to make it
> searchable.
>
> Doing the search and getting the hits found is not effected by the size of
> the document found.
>
> But when I try and access a field (my document id) in the document
>
>  i.e.
>
> // Create Lucene Doc with value
> Document doc = hits.doc(i);
>
> String number = doc.get("Field10");
>
>
> The creation of the Lucene document can take up to a second per hit. I
don't
> actually use any of the other fields apart from getting my ID value from
> field10.
>
> So my question is:-
>
> Is there a smarter way of getting out the 'Field10' value without it
> populating all the rest of the fields in the Lucene document and therefore
> reduce the time taken for this action.
>
>
> Paul
>
> DISCLAIMER:
>  The information in this message is confidential and may be legally
> privileged. It is intended solely for the addressee.  Access to this
message
> by anyone else is unauthorised.  If you are not the intended recipient,
any
> disclosure, copying, or distribution of the message,  or any action or
> omission taken by you in reliance on it, is prohibited and may be
unlawful.
> Please immediately contact the sender if you have received this message in
> error.
>  Thank you.
>  Valid Information Systems Limited.   Address:  Morline House,  160 London
> Road,  Barking, Essex, IG11 8BB.
> http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040
> -----------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message