manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 18:27:25 GMT
Yes.
Karl


On Fri, Jul 18, 2014 at 2:26 PM, Ameya Aware <ameya.aware@gmail.com> wrote:

> So for Hop filters tab:
> [image: Inline image 1]
>
> are you suggesting to choose 3rd option i.e. "Keep unreachable
> documents,forever"?
>
>
> Thanks,
> Ameya
>
>
> On Fri, Jul 18, 2014 at 2:15 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Something else you should be aware of: Hop-count filtering is very
>> expensive.  If you are using a connector that uses it, and you don't need
>> it, you should consider disabling it.  Pick the bottom radio button on the
>> Hop Count tab to do that.
>>
>> Thanks,
>> Karl
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 1:34 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Ameya,
>>>
>>> If you are still using Derby, which apparently you are according to the
>>> stack trace, then a pause of 420 seconds is likely because Derby got itself
>>> stuck.  Derby is like that which is why we don't recommend it for
>>> production.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 1:31 PM, Ameya Aware <ameya.aware@gmail.com>
>>> wrote:
>>>
>>>> No Karl,
>>>>
>>>> I did not do VACUUM here.
>>>>
>>>> Why would queries stopped after running for about 420 sec? is it
>>>> because of the errors coming in?
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 12:32 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ameya,
>>>>>
>>>>> For future reference, when you see stuff like this in the log:
>>>>>
>>>>> >>>>>>
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') - Found a
>>>>> long-running query (458934 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND
>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND
>>>>> t1.isnew=?))]
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - Found a
>>>>> long-running query (420965 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND
>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND
>>>>> t1.isnew=?))]
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter 0: 'D'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') - Found a
>>>>> long-running query (421120 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND
>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND
>>>>> t1.isnew=?))]
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') - Found a
>>>>> long-running query (420985 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND
>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND
>>>>> t1.isnew=?))]
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') - Found a
>>>>> long-running query (421173 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND
>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND
>>>>> t1.isnew=?))]
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter 0: 'D'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter 0: 'D'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter 0: 'D'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter 1:
>>>>> '-1'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter 0: 'D'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter 2:
>>>>> '1405692432586'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter 1:
>>>>> '-1'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '22') - Found a
>>>>> long-running query (421052 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND
>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND
>>>>> t1.isnew=?))]
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter 1:
>>>>> '-1'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter 1: '-1'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter 2:
>>>>> '1405692432586'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter 0: 'D'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter 2:
>>>>> '1405692432586'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter 3:
>>>>> '9ABFEB709B646CD0C84B4B7B6300E2C9BD5E3477'
>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter 1:
>>>>> '-1'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '39') -   Parameter 4: 'B'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter 3:
>>>>> 'A932EC77CEF156EA26A4239F12BAB365E6B4F58D'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter 1:
>>>>> '-1'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter 3:
>>>>> '9DFF75EBE13D0AAE8AFF025E992C68AB203ED1CB'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter 2:
>>>>> '1405692432586'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter 4: 'B'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter 2:
>>>>> '1405692432586'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter 3:
>>>>> '023FDBD3638711F4E55A918B862A064161B0892A'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter 4: 'B'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter 4: 'B'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter 2:
>>>>> '1405692432586'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter 3:
>>>>> '0158B8EDFEE3DDB10113B6D6E378D5FBF165E1FD'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter 3:
>>>>> 'FD9641C67D0C1EC22B5F05671513D4DD71B4582C'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter 4: 'B'
>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter 4: 'B'
>>>>> <<<<<<
>>>>>
>>>>> ... it means that MANY queries basically stopped running for about 420
>>>>> seconds.  I bet you did a VACUUM then, right?
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 12:30 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ameya,
>>>>>>
>>>>>> The log file is full of errors of all sorts.  For example:
>>>>>>
>>>>>> >>>>>
>>>>>>  WARN 2014-07-17 17:32:38,709 (Worker thread '41') - IO exception
>>>>>> during indexing
>>>>>> file:/C:/Program%20Files/eclipse/configuration/org.eclipse.osgi/.manager/.tmp2043698995563843992.instance:
>>>>>> The process cannot access the file because another process has locked
a
>>>>>> portion of the file
>>>>>> java.io.IOException: The process cannot access the file because
>>>>>> another process has locked a portion of the file
>>>>>>     at java.io.FileInputStream.readBytes(Native Method)
>>>>>>     at java.io.FileInputStream.read(Unknown Source)
>>>>>>     at
>>>>>> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:91)
>>>>>>     at
>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>>>>>>     at
>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>>>>>>     at
>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
>>>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>     at
>>>>>> org.apache.http.impl.execchain.RequestEntityExecHandler.invoke(RequestEntityExecHandler.java:77)
>>>>>>     at com.sun.proxy.$Proxy0.writeTo(Unknown Source)
>>>>>>     at
>>>>>> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:155)
>>>>>>     at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>     at
>>>>>> org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>>>>     at com.sun.proxy.$Proxy1.sendRequestEntity(Unknown Source)
>>>>>>     at
>>>>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
>>>>>>     at
>>>>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
>>>>>>     at
>>>>>> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
>>>>>>     at
>>>>>> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>>>>>>     at
>>>>>> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>>>>>>     at
>>>>>> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>>>>>>     at
>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>>>>>     at
>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>>>>>     at
>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>>>>>     at
>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:292)
>>>>>>     at
>>>>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>>>>     at
>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>>>>     at
>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
>>>>>> <<<<<
>>>>>>
>>>>>> This error occurs because you are trying to index a file on Windows
>>>>>> that is open by an application.  If you do this kind of thing, ManifoldCF
>>>>>> will requeue the document and will try it again later -- say, in
5 minutes,
>>>>>> and keep retrying it for many hours before it gives up.
>>>>>>
>>>>>> I suspect that you are not seeing "hangs", but rather situations
>>>>>> where MCF is simply waiting for a problem to resolve.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 11:27 AM, Ameya Aware <ameya.aware@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Attaching log file
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 11:15 AM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Also, please send the file logs/manifoldcf.log as well --
as a text
>>>>>>>> file.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Could you please get a thread dump and send that to me?
 Please
>>>>>>>>> send as a text file not a screen shot.
>>>>>>>>>
>>>>>>>>> To get a thread dump, get the process ID of the agents
process,
>>>>>>>>> and use the jdk's jstack utility to obtain the dump.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware <
>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> yeah.. i thought so that it should not effect in
4000 documents.
>>>>>>>>>>
>>>>>>>>>> I am using filesystem connector to crawl all of my
C drive and
>>>>>>>>>> output connection is null.
>>>>>>>>>>
>>>>>>>>>> There are no error logs in MCF. MCF is standstill
at same screen
>>>>>>>>>> since half an hour.
>>>>>>>>>>
>>>>>>>>>> Attaching some snapshots for your reference.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ameya
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright <daddywri@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>
>>>>>>>>>>> 4000 documents is nothing at all.  We have load
tests which I
>>>>>>>>>>> run on every release that include more than 100000
documents on a crawl.
>>>>>>>>>>>
>>>>>>>>>>> Can you be more specific about the case that
you say "hung up"?
>>>>>>>>>>> Specifically:
>>>>>>>>>>>
>>>>>>>>>>> (1) What kind of crawl is this?  SharePoint?
 Web?
>>>>>>>>>>> (2) Are there any errors in the manifoldcf log?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware
<
>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>
>>>>>>>>>>>> I spent some time going through PostgreSQL
9.3 manual.
>>>>>>>>>>>> I configured PostgreSQL for MCF and saw the
significant change
>>>>>>>>>>>> in performance time.
>>>>>>>>>>>>
>>>>>>>>>>>> I ran it yesterday for some 4000 documents.
When i started
>>>>>>>>>>>> running again today, the performance was
very poor and after 200 documents,
>>>>>>>>>>>> it hung up.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it because of periodic maintenance it
needs?  Also, i would
>>>>>>>>>>>> want to know where and how exactly VACUUM
FULL command needs
>>>>>>>>>>>> to be used?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Ameya
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:13 PM, Karl Wright
<
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It is fine; I am running Postgresql 9.3
here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:08 PM, Ameya
Aware <
>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> is PostgreySQL 9.3 version good because
i already have it in
>>>>>>>>>>>>>> my machine.. Though documentation
says "ManifoldCF has been
>>>>>>>>>>>>>> tested against version 8.3.7, 8.4.5
and 9.1 of PostgreSQL. "
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 1:09 PM,
Karl Wright <
>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you haven't configured MCF
to use PostgreSQL, then you
>>>>>>>>>>>>>>> are using Derby, which is not
recommended for production use.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Instructions on how to set up
MCF to use PostgreSQL are
>>>>>>>>>>>>>>> available on the MCF site on
the how-to-build-and-deploy page.  Configuring
>>>>>>>>>>>>>>> PostgreSQL for millions or tens
of millions of documents will require
>>>>>>>>>>>>>>> someone to learn about PostgreSQL
and how to administer it.  The
>>>>>>>>>>>>>>> how-to-build-and-deploy page
provides some (old) guidelines and hints, but
>>>>>>>>>>>>>>> if I were you I'd read the postgresql
manual for the version you install.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 1:04
PM, Ameya Aware <
>>>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ooh ok.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Actually i have never configured
PostgreySQL yet. i am
>>>>>>>>>>>>>>>> simply using binary distribution
of MCF to configure file system connectors
>>>>>>>>>>>>>>>> to connect to Solr.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do i need to configure PostgreySQL??
How can i proceed from
>>>>>>>>>>>>>>>> here to check performance
measurements?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 12:10
PM, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes.  Also have a look
at the how-to-build-and-deploy page
>>>>>>>>>>>>>>>>> for hints on how to configure
PostgreSQL for maximum performance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ManifoldCF's performance
is almost entirely based on the
>>>>>>>>>>>>>>>>> database.  If you are
using PostgreSQL, which is the fastest ManifoldCF
>>>>>>>>>>>>>>>>> choice, you should be
able to see in the logs when queries take a long
>>>>>>>>>>>>>>>>> time, or when indexes
are automatically rebuilt.  Could you provide any
>>>>>>>>>>>>>>>>> information as to what
your overall system setup looks like?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 11:32 AM, Ameya Aware <
>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This page?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 11:28 AM, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Have you read
the performance page?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sent from my
Windows Phone
>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>> From: Ameya Aware
>>>>>>>>>>>>>>>>>>> Sent: 7/17/2014
11:27 AM
>>>>>>>>>>>>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>>>>>>>>>>>>> Subject: Performance
issues
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have millions
of documents to crawl and send them to
>>>>>>>>>>>>>>>>>>> Solr.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> But when i run
it for thousands documents, it takes too
>>>>>>>>>>>>>>>>>>> much time for
it or sometimes it even hangs up.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So what could
be the way to reduce the performance time?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also, i do not
need content of the documents, i just
>>>>>>>>>>>>>>>>>>> need metadata,
so can i skip content part from reading and fetching and
>>>>>>>>>>>>>>>>>>> will that improve
performance time?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message