manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: SOLR
Date Tue, 15 Mar 2011 04:37:20 GMT
Hi Karl,

My only guess is we submit URI of a document to SOLR Cell, and Solr Cell
retrieves it from Internet (using probably HttpClient and "may be" using own
Robot signature?)
Even in case of RSS...
Only this can explain why I have "navigation" and "login" in SOLR index...

Am I right?


Thanks



-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca] 
Sent: March-15-11 12:26 AM
To: connectors-user@incubator.apache.org
Subject: RE: SOLR

UPDATE:
SOLR 1.4.1 (june-2010) works fine with ManifoldCF trunk.
SOLR trunk doesn't work, and I suspect bugs in TIKA...

But it is strange :)

I am looking at SOLR, each document contains huge array of "links",
including many links to Yahoo login... something weird (it doesn't look like
RSS)... but searchable.


-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca]
Sent: March-14-11 10:50 PM
To: 'connectors-user@incubator.apache.org'
Subject: RE: SOLR


I just noticed:
Currently, default for ManifoldCF is /update/extract, which corresponds to
SOLR Cell request handler.

So...
It is EXTREMELY generic...
http://wiki.apache.org/solr/ExtractingRequestHandler



Mime
View raw message