lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: large document with multiple fields performance
Date Tue, 08 Sep 2009 17:07:48 GMT
Hey Steve,

I'd suggest you go with the 20 fields (Non normalized) model. I've used much
larger models and they happen to work just fine. Wouldnt be a point
increasing the complexity.
Hope that clarifies things a little atleast :)
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Tue, Sep 8, 2009 at 6:57 PM, Stephen Greene
<SGreene@metalseconomics.com>wrote:

> Hi Anshum,
>
> Thank you for your reply. I have two options I am considering.
> One would be:
> Document {
>        String projectID;
>        String generalComment;
>        String workHistoryComment;
>        String environmentalComment;
>        String claimsComment;
>        ...
> }
>
> And the document may contain upwards of 20 comment fields.
>
> The other option would be to normalize the data
> Document {
>        String projectID;
>        String commentType;
>        String comment;
> }
>
> I will need to return only the projectID for all found documents. I have
> implemented a custom Collector to capture the projectID for each
> document. Then it occurred to me that I might be better served by the
> normalized document model. But I am wondering which method will have
> better performance: possibly returning 20 documents per hit, or having
> to search 20 fields per document? (This also has implications for the
> query, as each search term will always search all fields, this is
> somewhat easier in the normalized example as opposed to creating 20 "or"
> queries.)
>
> Thanks,
>
> Steve
>
> -----Original Message-----
> From: Anshum [mailto:anshumg@gmail.com]
> Sent: Tuesday, September 08, 2009 9:47 AM
> To: java-user@lucene.apache.org
> Subject: Re: large document with multiple fields performance
>
> Hi Stephen,
> Could you clarify more on the requirement. Do you intend to have data in
> index as:
> Document{
>  String Comment;
>  String CommentId;
>  String ProjectId;
> }
>
> How do you intend to index it.. as in the doc structure? Is there  a
> primary
> key there? What would you search on? What would you want to have as the
> result?
> All said and done, its not really an overhead as long as the number of
> fields is within normal bounds.
>
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Tue, Sep 8, 2009 at 5:27 PM, Stephen Greene
> <SGreene@metalseconomics.com>wrote:
>
> > Hello,
> >
> >
> >
> > I am new to lucene and building an application which requires
> documents
> > with many fields to be searched.
> >
> > A "project" id is being stored (not_analyzed) and all matching project
> > ids will be returned to be used to join other data from a database.
> >
> > Will it provide better performance to store each comment field in a
> > separate document with the project ID and a comment ID or to store all
> > the comments for a single project in a single document with multiple
> > fields?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Steve Greene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message