lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Documents With large number of fields
Date Mon, 14 May 2012 14:44:12 GMT
Indexing should be fine - depending on your total document count. I think 
the potential issue is the FieldCache at query time. I think it should be 
linear based on number of documents, fields, and unique terms per field for 
string values, so if you do two tests, index with 1,000 docs and then 2,000 
docs, and then check Java memory usage after a simple query, then after a 
query with a significant number of these faceted fields, and then after a 
couple more queries with a high number of distinct fields that are faceted, 
and then multiply those memory use increments to scale up to your expected 
range of documents, that should give you a semi-decent estimate of memory 
the JVM will need. CPU requirement estimating would be more complex, but 
memory has to work out first. And the delta for index size between 1,000 and 
2,000 should give you a number to scale up to total index size, roughly, but 
depending on relative uniqueness of field values.

-- Jack Krupansky

-----Original Message----- 
From: Keswani, Nitin - BLS CTR
Sent: Monday, May 14, 2012 10:27 AM
To: solr-user@lucene.apache.org
Subject: RE: Documents With large number of fields

Unfortunately I never got any response. However I did a POC with a Document 
containing 400 fields and loaded around 1000 docs to my local machine. I 
didn’t see any issue but then again the document set was very small. 
Hopefully as mentioned below providing enough memory should help alleviate 
any performance issues.

Thanks.

Regards,

Nitin Keswani


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Sunday, May 13, 2012 10:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Documents With large number of fields

I didn't see any response. There was a similar issue recently, where someone 
had 400 faceted fields with 50-70 facets per query and they were running out 
of memory due to accumulation of the FieldCache for these faceted fields, 
but that was on a 3 GB system.

It probably could be done, assuming a fair number of 64-bit sharded 
machines.

-- Jack Krupansky

-----Original Message-----
From: Darren Govoni
Sent: Sunday, May 13, 2012 7:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Documents With large number of fields

Was there a response to this?

On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote:
> Hi,
>
> My data model consist of different types of data. Each data type has
> its own characteristics
>
> If I include the unique characteristics of each type of data, my
> single Solr Document could end up containing 300-400 fields.
>
> In order to drill down to this data set I would have to provide
> faceting on most of these fields so that I can drilldown to very small
> set of Documents.
>
> Here are some of the questions :
>
> 1) What's the best approach when dealing with documents with large
> number of fields .
>     Should I keep a single document with large number of fields or
> split my
>     document into a number of smaller  documents where each document
> would consist of some fields
>
> 2) From an operational point of view, what's the drawback of having a
> single document with a very large number of fields.
>     Can Solr support documents with large number of fields (say 300 to
> 400).
>
>
> Thanks.
>
> Regards,
>
> Nitin Keswani
>


Mime
View raw message