lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Greene" <SGre...@metalseconomics.com>
Subject RE: large document with multiple fields performance
Date Mon, 14 Sep 2009 01:48:51 GMT
Hi Anshum,

Thanks for your insight. I will stick with the 20 fields. 
I realized that I had neglected to mention that in a separate query I
will search on the primary key and a search term to return details about
how many hits come from each field. Is it safe to assume that this will
also not be a problem and implementing a custom hitcollector will do the
trick?

Thanks again,

Steve 

-----Original Message-----
From: Anshum [mailto:anshumg@gmail.com] 
Sent: Tuesday, September 08, 2009 2:08 PM
To: java-user@lucene.apache.org
Subject: Re: large document with multiple fields performance

Hey Steve,

I'd suggest you go with the 20 fields (Non normalized) model. I've used
much
larger models and they happen to work just fine. Wouldnt be a point
increasing the complexity.
Hope that clarifies things a little atleast :)
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Tue, Sep 8, 2009 at 6:57 PM, Stephen Greene
<SGreene@metalseconomics.com>wrote:

> Hi Anshum,
>
> Thank you for your reply. I have two options I am considering.
> One would be:
> Document {
>        String projectID;
>        String generalComment;
>        String workHistoryComment;
>        String environmentalComment;
>        String claimsComment;
>        ...
> }
>
> And the document may contain upwards of 20 comment fields.
>
> The other option would be to normalize the data
> Document {
>        String projectID;
>        String commentType;
>        String comment;
> }
>
> I will need to return only the projectID for all found documents. I
have
> implemented a custom Collector to capture the projectID for each
> document. Then it occurred to me that I might be better served by the
> normalized document model. But I am wondering which method will have
> better performance: possibly returning 20 documents per hit, or having
> to search 20 fields per document? (This also has implications for the
> query, as each search term will always search all fields, this is
> somewhat easier in the normalized example as opposed to creating 20
"or"
> queries.)
>
> Thanks,
>
> Steve
>
> -----Original Message-----
> From: Anshum [mailto:anshumg@gmail.com]
> Sent: Tuesday, September 08, 2009 9:47 AM
> To: java-user@lucene.apache.org
> Subject: Re: large document with multiple fields performance
>
> Hi Stephen,
> Could you clarify more on the requirement. Do you intend to have data
in
> index as:
> Document{
>  String Comment;
>  String CommentId;
>  String ProjectId;
> }
>
> How do you intend to index it.. as in the doc structure? Is there  a
> primary
> key there? What would you search on? What would you want to have as
the
> result?
> All said and done, its not really an overhead as long as the number of
> fields is within normal bounds.
>
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Tue, Sep 8, 2009 at 5:27 PM, Stephen Greene
> <SGreene@metalseconomics.com>wrote:
>
> > Hello,
> >
> >
> >
> > I am new to lucene and building an application which requires
> documents
> > with many fields to be searched.
> >
> > A "project" id is being stored (not_analyzed) and all matching
project
> > ids will be returned to be used to join other data from a database.
> >
> > Will it provide better performance to store each comment field in a
> > separate document with the project ID and a comment ID or to store
all
> > the comments for a single project in a single document with multiple
> > fields?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Steve Greene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message