manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 15:15:04 GMT
Also, please send the file logs/manifoldcf.log as well -- as a text file.

Karl


On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright <daddywri@gmail.com> wrote:

> Could you please get a thread dump and send that to me?  Please send as a
> text file not a screen shot.
>
> To get a thread dump, get the process ID of the agents process, and use
> the jdk's jstack utility to obtain the dump.
>
> Thanks,
> Karl
>
>
>
> On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware <ameya.aware@gmail.com>
> wrote:
>
>> yeah.. i thought so that it should not effect in 4000 documents.
>>
>> I am using filesystem connector to crawl all of my C drive and output
>> connection is null.
>>
>> There are no error logs in MCF. MCF is standstill at same screen since
>> half an hour.
>>
>> Attaching some snapshots for your reference.
>>
>>
>> Thanks,
>> Ameya
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Ameya,
>>>
>>> 4000 documents is nothing at all.  We have load tests which I run on
>>> every release that include more than 100000 documents on a crawl.
>>>
>>> Can you be more specific about the case that you say "hung up"?
>>> Specifically:
>>>
>>> (1) What kind of crawl is this?  SharePoint?  Web?
>>> (2) Are there any errors in the manifoldcf log?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware <ameya.aware@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> I spent some time going through PostgreSQL 9.3 manual.
>>>> I configured PostgreSQL for MCF and saw the significant change in
>>>> performance time.
>>>>
>>>> I ran it yesterday for some 4000 documents. When i started running
>>>> again today, the performance was very poor and after 200 documents, it hung
>>>> up.
>>>>
>>>> Is it because of periodic maintenance it needs?  Also, i would want to
>>>> know where and how exactly VACUUM FULL command needs to be used?
>>>>
>>>> Thanks,
>>>> Ameya
>>>>
>>>>
>>>> On Thu, Jul 17, 2014 at 2:13 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> It is fine; I am running Postgresql 9.3 here.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 2:08 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> is PostgreySQL 9.3 version good because i already have it in my
>>>>>> machine.. Though documentation says "ManifoldCF has been tested
>>>>>> against version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>>>
>>>>>> Ameya
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 17, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If you haven't configured MCF to use PostgreSQL, then you are
using
>>>>>>> Derby, which is not recommended for production use.
>>>>>>>
>>>>>>> Instructions on how to set up MCF to use PostgreSQL are available
on
>>>>>>> the MCF site on the how-to-build-and-deploy page.  Configuring
PostgreSQL
>>>>>>> for millions or tens of millions of documents will require someone
to learn
>>>>>>> about PostgreSQL and how to administer it.  The how-to-build-and-deploy
>>>>>>> page provides some (old) guidelines and hints, but if I were
you I'd read
>>>>>>> the postgresql manual for the version you install.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 17, 2014 at 1:04 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ooh ok.
>>>>>>>>
>>>>>>>> Actually i have never configured PostgreySQL yet. i am simply
using
>>>>>>>> binary distribution of MCF to configure file system connectors
to connect
>>>>>>>> to Solr.
>>>>>>>>
>>>>>>>> Do i need to configure PostgreySQL?? How can i proceed from
here to
>>>>>>>> check performance measurements?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ameya
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 17, 2014 at 12:10 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes.  Also have a look at the how-to-build-and-deploy
page for
>>>>>>>>> hints on how to configure PostgreSQL for maximum performance.
>>>>>>>>>
>>>>>>>>> ManifoldCF's performance is almost entirely based on
the
>>>>>>>>> database.  If you are using PostgreSQL, which is the
fastest ManifoldCF
>>>>>>>>> choice, you should be able to see in the logs when queries
take a long
>>>>>>>>> time, or when indexes are automatically rebuilt.  Could
you provide any
>>>>>>>>> information as to what your overall system setup looks
like?
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 17, 2014 at 11:32 AM, Ameya Aware <
>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>
>>>>>>>>>> This page?
>>>>>>>>>>
>>>>>>>>>> Ameya
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <daddywri@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>
>>>>>>>>>>> Have you read the performance page?
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> Sent from my Windows Phone
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> From: Ameya Aware
>>>>>>>>>>> Sent: 7/17/2014 11:27 AM
>>>>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>>>>> Subject: Performance issues
>>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> I have millions of documents to crawl and send
them to Solr.
>>>>>>>>>>>
>>>>>>>>>>> But when i run it for thousands documents, it
takes too much
>>>>>>>>>>> time for it or sometimes it even hangs up.
>>>>>>>>>>>
>>>>>>>>>>> So what could be the way to reduce the performance
time?
>>>>>>>>>>>
>>>>>>>>>>> Also, i do not need content of the documents,
i just need
>>>>>>>>>>> metadata, so can i skip content part from reading
and fetching and will
>>>>>>>>>>> that improve performance time?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Ameya
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message