lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tirthankar Chatterjee <tchatter...@commvault.com>
Subject RE: Document Size for Indexing
Date Tue, 06 Sep 2011 18:59:27 GMT
Hi Simon,
Thanks for your reply and looking into this one. 

Yes we are using TIKA/SOLRJ as client process,  trying to index using JVM max heap memory
to 8GB RAM and it is a 64 bit VM with Server option enabled. 

We have mixed set of emails and documents which ranges from few KB's to 700MB's. We see failure
in the large size files. 

We are indexing 100 such documents in one batch. I will get you the stack trace soon. 

Thanks,
Tirthankar




-----Original Message-----
From: simon [mailto:mtnest46@gmail.com] 
Sent: Wednesday, August 31, 2011 3:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Document Size for Indexing

So if I understand you, you are  using Tika /SolrJ together in a Solr client process which
talks to your Solr server ? What is the heap size ? Can you give us a  stack trace from the
OOM exception ?

-Simon

On Wed, Aug 31, 2011 at 10:58 AM, Tirthankar Chatterjee < tchatterjee@commvault.com>
wrote:

> I am using 64 bit JVM and we are going out of memory in extraction 
> phase where TIKA assigns the content after extracting to 
> SOLRInputDocument in the pipeline which gets loaded in memory.
>
> We are using released 3.1 version of SOLR.
>
> Thanks,
> Tirthankar
>
> -----Original Message-----
> From: simon [mailto:mtnest46@gmail.com]
> Sent: Tuesday, August 30, 2011 1:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Document Size for Indexing
>
> what issues exactly ?
>
> are you using 32 bit Java ? That will restrict the JVM heap size to 
> 2GB max.
>
> -Simon
>
> On Tue, Aug 30, 2011 at 11:26 AM, Tirthankar Chatterjee < 
> tchatterjee@commvault.com> wrote:
>
> > Hi,
> >
> > I have a machine (win 2008R2) with 16GB RAM, I am having issue 
> > indexing 1/2GB files. How do we avoid creating a SOLRInputDocument 
> > or is there any way to directly use Lucene Index writer classes.
> >
> > What would be the best approach. We need some suggestions.
> >
> > Thanks,
> > Tirthankar
> >
> >
> > ******************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material 
> > for the sole use of the intended recipient. Any unauthorized review, 
> > use or distribution by others is strictly prohibited. If you have 
> > received the message in error, please advise the sender by reply 
> > email and delete the message. Thank you."
> > *********************************************************
> ******************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message in error, please advise the sender by reply email 
> and delete the message. Thank you."
> *********************************************************
>

Mime
View raw message