lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Palmer, Eric" <epal...@richmond.edu>
Subject Re: Replacing Google Mini Search Appliance with Solr?
Date Wed, 30 Oct 2013 20:37:05 GMT
Thanks for the link

Sent from my iPhone

On Oct 30, 2013, at 4:06 PM, "Rajani Maski" <rajinimaski@gmail.com> wrote:

> Hi Eric,
> 
>  I have also developed mini-applications replacing GSA for some of our
> clients using Apache Nutch + Solr to crawl multi lingual sites and enable
> multi-lingual search. Nutch+Solr is very stable and Nutch mailing list
> provides a good support.
> 
> Reference link to start:
> https://sites.google.com/site/profilerajanimaski/webcrawlers/apache-nutch
> 
> Thanks
> Rajani
> 
> 
> 
> 
> On Thu, Oct 31, 2013 at 12:27 AM, Palmer, Eric <epalmer@richmond.edu> wrote:
> 
>> Markus and Jason
>> 
>> thanks for the info.
>> 
>> I will start to research Nutch.  Writing a crawler, agree it is a rabbit
>> hole.
>> 
>> 
>> --
>> Eric Palmer
>> 
>> Web Services
>> U of Richmond
>> 
>> To report technical issues, obtain technical support or make requests for
>> enhancements please visit
>> http://web.richmond.edu/contact/technical-support.html
>> 
>> 
>> 
>> 
>> 
>> On 10/30/13 2:53 PM, "Jason Hellman" <jhellman@innoventsolutions.com>
>> wrote:
>> 
>>> Nutch is an excellent option.  It should feel very comfortable for people
>>> migrating away from the Google appliances.
>>> 
>>> Apache Droids is another possible way to approach, and I¹ve found people
>>> using Heretrix or Manifold for various use cases (and usually in
>>> combination with other use cases where the extra overhead was worth the
>>> trouble).
>>> 
>>> I think the simples approach will be NutchŠit¹s absolutely worth taking a
>>> shot at it.
>>> 
>>> DO NOT write a crawler!  That is a rabbit hole you do not want to peer
>>> down into :)
>>> 
>>> 
>>> 
>>> On Oct 30, 2013, at 10:54 AM, Markus Jelsma <markus.jelsma@openindex.io>
>>> wrote:
>>> 
>>>> Hi Eric,
>>>> 
>>>> We have also helped some government institution to replave their
>>>> expensive GSA with open source software. In our case we use Apache Nutch
>>>> 1.7 to crawl the websites and index to Apache Solr. It is very
>>>> effective, robust and scales easily with Hadoop if you have to. Nutch
>>>> may not be the easiest tool for the job but is very stable, feature rich
>>>> and has an active community here at Apache.
>>>> 
>>>> Cheers,
>>>> 
>>>> -----Original message-----
>>>>> From:Palmer, Eric <epalmer@richmond.edu>
>>>>> Sent: Wednesday 30th October 2013 18:48
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Replacing Google Mini Search Appliance with Solr?
>>>>> 
>>>>> Hello all,
>>>>> 
>>>>> Been lurking on the list for awhile.
>>>>> 
>>>>> We are at the end of life for replacing two google mini search
>>>>> appliances used to index our public web sites. Google is no longer
>>>>> selling the mini appliances and buying the big appliance is not cost
>>>>> beneficial.
>>>>> 
>>>>> http://search.richmond.edu/
>>>>> 
>>>>> We would run a solr replacement in linux (cents, redhat, similar) with
>>>>> open Java or Oracle Java.
>>>>> 
>>>>> Background
>>>>> ==========
>>>>> ~130 sites
>>>>> only ~12,000 pages (at a depth of 3)
>>>>> probably ~40,000 pages if we go to a depth of 4
>>>>> 
>>>>> We use key matches a lot. In solr terms these are elevated documents
>>>>> (elevations)
>>>>> 
>>>>> We would code a search query form in php and wrap it into our design
>>>>> (http://www.richmond.edu)
>>>>> 
>>>>> I have played with and love lucidworks and know that their $ solution
>>>>> works for our use cases but the cost model is not attractive for such
a
>>>>> small collection.
>>>>> 
>>>>> So with solr what are my open source options and what are people's
>>>>> experiences crawling and indexing web sites with solr + crawler. I
>>>>> understand there is not a crawler with solr so that would have to be
>>>>> first up to get one working.
>>>>> 
>>>>> We can code in Java, PHP, Python etc. if we have to, but we don't want
>>>>> to write a crawler if we can avoid it.
>>>>> 
>>>>> thanks in advance for and information.
>>>>> 
>>>>> --
>>>>> Eric Palmer
>>>>> Web Services
>>>>> U of Richmond
>> 
>> 

Mime
View raw message