manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 19:10:41 GMT
Hi Ameya,

Rebuilding will of course set your properties back to the build defaults.

Karl



On Fri, Jul 18, 2014 at 3:08 PM, Ameya Aware <ameya.aware@gmail.com> wrote:

> Hi
>
> Am i not supposed to run 'ant build' command after changing properties.xml
> file?
>
> Because that is what set my configured PostgreSQL back to derby
>
> Ameya
>
>
> On Fri, Jul 18, 2014 at 2:27 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Yes.
>> Karl
>>
>>
>> On Fri, Jul 18, 2014 at 2:26 PM, Ameya Aware <ameya.aware@gmail.com>
>> wrote:
>>
>>> So for Hop filters tab:
>>> [image: Inline image 1]
>>>
>>> are you suggesting to choose 3rd option i.e. "Keep unreachable
>>> documents,forever"?
>>>
>>>
>>> Thanks,
>>> Ameya
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:15 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Something else you should be aware of: Hop-count filtering is very
>>>> expensive.  If you are using a connector that uses it, and you don't need
>>>> it, you should consider disabling it.  Pick the bottom radio button on the
>>>> Hop Count tab to do that.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 1:34 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ameya,
>>>>>
>>>>> If you are still using Derby, which apparently you are according to
>>>>> the stack trace, then a pause of 420 seconds is likely because Derby
got
>>>>> itself stuck.  Derby is like that which is why we don't recommend it
for
>>>>> production.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 1:31 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> No Karl,
>>>>>>
>>>>>> I did not do VACUUM here.
>>>>>>
>>>>>> Why would queries stopped after running for about 420 sec? is it
>>>>>> because of the errors coming in?
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 12:32 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Ameya,
>>>>>>>
>>>>>>> For future reference, when you see stuff like this in the log:
>>>>>>>
>>>>>>> >>>>>>
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') - Found a
>>>>>>> long-running query (458934 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1
WHERE
>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>> t1.isnew=?))]
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - Found a
>>>>>>> long-running query (420965 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1
WHERE
>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>> t1.isnew=?))]
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
0:
>>>>>>> 'D'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') - Found a
>>>>>>> long-running query (421120 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1
WHERE
>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>> t1.isnew=?))]
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') - Found a
>>>>>>> long-running query (420985 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1
WHERE
>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>> t1.isnew=?))]
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') - Found a
>>>>>>> long-running query (421173 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1
WHERE
>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>> t1.isnew=?))]
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter
0:
>>>>>>> 'D'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter
0:
>>>>>>> 'D'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter
0:
>>>>>>> 'D'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
1:
>>>>>>> '-1'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter
0:
>>>>>>> 'D'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
2:
>>>>>>> '1405692432586'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter
1:
>>>>>>> '-1'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '22') - Found a
>>>>>>> long-running query (421052 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1
WHERE
>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>> t1.isnew=?))]
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter
1:
>>>>>>> '-1'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter
1:
>>>>>>> '-1'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
2:
>>>>>>> '1405692432586'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
0:
>>>>>>> 'D'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
2:
>>>>>>> '1405692432586'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
3:
>>>>>>> '9ABFEB709B646CD0C84B4B7B6300E2C9BD5E3477'
>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter
1:
>>>>>>> '-1'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '39') -   Parameter
4:
>>>>>>> 'B'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
3:
>>>>>>> 'A932EC77CEF156EA26A4239F12BAB365E6B4F58D'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
1:
>>>>>>> '-1'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
3:
>>>>>>> '9DFF75EBE13D0AAE8AFF025E992C68AB203ED1CB'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter
2:
>>>>>>> '1405692432586'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
4:
>>>>>>> 'B'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
2:
>>>>>>> '1405692432586'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
3:
>>>>>>> '023FDBD3638711F4E55A918B862A064161B0892A'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
4:
>>>>>>> 'B'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
4:
>>>>>>> 'B'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
2:
>>>>>>> '1405692432586'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter
3:
>>>>>>> '0158B8EDFEE3DDB10113B6D6E378D5FBF165E1FD'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
3:
>>>>>>> 'FD9641C67D0C1EC22B5F05671513D4DD71B4582C'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter
4:
>>>>>>> 'B'
>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
4:
>>>>>>> 'B'
>>>>>>> <<<<<<
>>>>>>>
>>>>>>> ... it means that MANY queries basically stopped running for
about
>>>>>>> 420 seconds.  I bet you did a VACUUM then, right?
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 12:30 PM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ameya,
>>>>>>>>
>>>>>>>> The log file is full of errors of all sorts.  For example:
>>>>>>>>
>>>>>>>> >>>>>
>>>>>>>>  WARN 2014-07-17 17:32:38,709 (Worker thread '41') - IO exception
>>>>>>>> during indexing
>>>>>>>> file:/C:/Program%20Files/eclipse/configuration/org.eclipse.osgi/.manager/.tmp2043698995563843992.instance:
>>>>>>>> The process cannot access the file because another process
has locked a
>>>>>>>> portion of the file
>>>>>>>> java.io.IOException: The process cannot access the file because
>>>>>>>> another process has locked a portion of the file
>>>>>>>>     at java.io.FileInputStream.readBytes(Native Method)
>>>>>>>>     at java.io.FileInputStream.read(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:91)
>>>>>>>>     at
>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>>>>>>>>     at
>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>>>>>>>>     at
>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
>>>>>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
Source)
>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>> Source)
>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.execchain.RequestEntityExecHandler.invoke(RequestEntityExecHandler.java:77)
>>>>>>>>     at com.sun.proxy.$Proxy0.writeTo(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:155)
>>>>>>>>     at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
Source)
>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>> Source)
>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>>>>>>     at com.sun.proxy.$Proxy1.sendRequestEntity(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
>>>>>>>>     at
>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>>>>>>>     at
>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>>>>>>>     at
>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:292)
>>>>>>>>     at
>>>>>>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>>>>>>     at
>>>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>>>>>>     at
>>>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
>>>>>>>> <<<<<
>>>>>>>>
>>>>>>>> This error occurs because you are trying to index a file
on Windows
>>>>>>>> that is open by an application.  If you do this kind of thing,
ManifoldCF
>>>>>>>> will requeue the document and will try it again later --
say, in 5 minutes,
>>>>>>>> and keep retrying it for many hours before it gives up.
>>>>>>>>
>>>>>>>> I suspect that you are not seeing "hangs", but rather situations
>>>>>>>> where MCF is simply waiting for a problem to resolve.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 11:27 AM, Ameya Aware <
>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Attaching log file
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 11:15 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Also, please send the file logs/manifoldcf.log as
well -- as a
>>>>>>>>>> text file.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright <daddywri@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Could you please get a thread dump and send that
to me?  Please
>>>>>>>>>>> send as a text file not a screen shot.
>>>>>>>>>>>
>>>>>>>>>>> To get a thread dump, get the process ID of the
agents process,
>>>>>>>>>>> and use the jdk's jstack utility to obtain the
dump.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware
<
>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> yeah.. i thought so that it should not effect
in 4000 documents.
>>>>>>>>>>>>
>>>>>>>>>>>> I am using filesystem connector to crawl
all of my C drive and
>>>>>>>>>>>> output connection is null.
>>>>>>>>>>>>
>>>>>>>>>>>> There are no error logs in MCF. MCF is standstill
at same
>>>>>>>>>>>> screen since half an hour.
>>>>>>>>>>>>
>>>>>>>>>>>> Attaching some snapshots for your reference.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Ameya
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright
<
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4000 documents is nothing at all.  We
have load tests which I
>>>>>>>>>>>>> run on every release that include more
than 100000 documents on a crawl.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you be more specific about the case
that you say "hung
>>>>>>>>>>>>> up"?  Specifically:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) What kind of crawl is this?  SharePoint?
 Web?
>>>>>>>>>>>>> (2) Are there any errors in the manifoldcf
log?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 10:59 AM, Ameya
Aware <
>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I spent some time going through PostgreSQL
9.3 manual.
>>>>>>>>>>>>>> I configured PostgreSQL for MCF and
saw the significant
>>>>>>>>>>>>>> change in performance time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I ran it yesterday for some 4000
documents. When i started
>>>>>>>>>>>>>> running again today, the performance
was very poor and after 200 documents,
>>>>>>>>>>>>>> it hung up.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it because of periodic maintenance
it needs?  Also, i
>>>>>>>>>>>>>> would want to know where and how
exactly VACUUM FULL command
>>>>>>>>>>>>>> needs to be used?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:13 PM,
Karl Wright <
>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is fine; I am running Postgresql
9.3 here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:08
PM, Ameya Aware <
>>>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is PostgreySQL 9.3 version
good because i already have it
>>>>>>>>>>>>>>>> in my machine.. Though documentation
says "ManifoldCF has
>>>>>>>>>>>>>>>> been tested against version
8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 1:09
PM, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you haven't configured
MCF to use PostgreSQL, then you
>>>>>>>>>>>>>>>>> are using Derby, which
is not recommended for production use.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Instructions on how to
set up MCF to use PostgreSQL are
>>>>>>>>>>>>>>>>> available on the MCF
site on the how-to-build-and-deploy page.  Configuring
>>>>>>>>>>>>>>>>> PostgreSQL for millions
or tens of millions of documents will require
>>>>>>>>>>>>>>>>> someone to learn about
PostgreSQL and how to administer it.  The
>>>>>>>>>>>>>>>>> how-to-build-and-deploy
page provides some (old) guidelines and hints, but
>>>>>>>>>>>>>>>>> if I were you I'd read
the postgresql manual for the version you install.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 1:04 PM, Ameya Aware <
>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ooh ok.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Actually i have never
configured PostgreySQL yet. i am
>>>>>>>>>>>>>>>>>> simply using binary
distribution of MCF to configure file system connectors
>>>>>>>>>>>>>>>>>> to connect to Solr.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Do i need to configure
PostgreySQL?? How can i proceed
>>>>>>>>>>>>>>>>>> from here to check
performance measurements?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 12:10 PM, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes.  Also have
a look at the how-to-build-and-deploy
>>>>>>>>>>>>>>>>>>> page for hints
on how to configure PostgreSQL for maximum performance.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ManifoldCF's
performance is almost entirely based on the
>>>>>>>>>>>>>>>>>>> database.  If
you are using PostgreSQL, which is the fastest ManifoldCF
>>>>>>>>>>>>>>>>>>> choice, you should
be able to see in the logs when queries take a long
>>>>>>>>>>>>>>>>>>> time, or when
indexes are automatically rebuilt.  Could you provide any
>>>>>>>>>>>>>>>>>>> information as
to what your overall system setup looks like?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 17,
2014 at 11:32 AM, Ameya Aware <
>>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This page?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jul
17, 2014 at 11:28 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Have
you read the performance page?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sent
from my Windows Phone
>>>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>>>> From:
Ameya Aware
>>>>>>>>>>>>>>>>>>>>> Sent:
7/17/2014 11:27 AM
>>>>>>>>>>>>>>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>>>>>>>>>>>>>>> Subject:
Performance issues
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have
millions of documents to crawl and send them to
>>>>>>>>>>>>>>>>>>>>> Solr.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> But when
i run it for thousands documents, it takes
>>>>>>>>>>>>>>>>>>>>> too much
time for it or sometimes it even hangs up.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So what
could be the way to reduce the performance
>>>>>>>>>>>>>>>>>>>>> time?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Also,
i do not need content of the documents, i just
>>>>>>>>>>>>>>>>>>>>> need
metadata, so can i skip content part from reading and fetching and
>>>>>>>>>>>>>>>>>>>>> will
that improve performance time?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message