nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Roberts <nicho...@themediasociety.org>
Subject Drupal Integration with Nutch via CSIRO's Arch ?
Date Thu, 29 Dec 2011 06:31:36 GMT
hi Arkadi

just poking around the website for Arch and am really excited by the
potential

am wondering if there are possible integration points with Drupal ?

thinking possible integration points could be via these Drupal contrib
modules;

Nutch http://drupal.org/project/nutch
Apache Solr Integration http://drupal.org/project/apachesolr
Search API http://drupal.org/project/search_api

will look into myself and try to work something out..

cheers

-N


On Thu, Dec 22, 2011 at 6:23 PM, <Arkadi.Kosmynin@csiro.au> wrote:

> Hi,
>
> This can be done using an index filter. For a source code example see this:
>
> http://www.atnf.csiro.au/computing/software/arch/
>
> Please see class au.csiro.cass.arch.filters.Index.
>
> If you are trying to implement a corporate search engine or a search
> hosting service for multiple web sites, it is quite likely that Arch can do
> everything you need. We've just released a version based on Nutch 1.4.
>
> Regards,
>
> Arkadi
>
>
>
>
> > -----Original Message-----
> > From: abhayd [mailto:ajdabholkar@hotmail.com]
> > Sent: Friday, 23 December 2011 6:21 AM
> > To: nutch-user@lucene.apache.org
> > Subject: nutch solr index process to add tag when indexing solr
> >
> > hi
> > We use use nuth to crawl site and index data is pushed using sorlindex
> > command.
> >
> > We have three sites that we crawl using nutch
> > http://xyz1.com/,http://xyz2.com/,http://xyz3.com/
> >
> > we create one crawldb's for each site.  We use single solr core to
> > consolidate three sites, And when we send data from each crawl db to
> > solr we
> > want to tag each site docs with source info
> >
> > so
> > doc1|xyz1|
> > doc23|xyz2|
> >
> > I dont see anyway to do this in nutch. Any help?
> >
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/nutch-
> > solr-index-process-to-add-tag-when-indexing-solr-tp3607311p3607311.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message