lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Which one is better - Lucene OR Google Search Appliance
Date Fri, 28 Nov 2008 09:10:52 GMT

On Nov 28, 2008, at 3:08 AM, Mike_SearchGuru wrote:
> OK basically we ahve 8 million pdf's to index and we have good  
> technical
> people in our company.

It's not too difficult, especially now with Tika, and Solr+Tike  
integration, to toss PDF's into a Lucene index and make them  
searchable.  Some programming/technical skills needed, but we're not  
talking about anything very complicated.  Here's a currently in- 
progress effort to make Solr trivially handle the ingestion/indexing  
of many rich document types, including PDF's:

   <https://issues.apache.org/jira/browse/SOLR-284>

> question is is lucene slower than GSA in terms of indexing pdf's?

Doubtful.  At least for 8M docs, you're not likely to see a  
significant advantage of one over the other.

> are there any costs for licenses if used commercially. If yes then  
> what are
> the costs?

Lucene, Solr, and Tika are Apache projects.  Entirely open source.  No  
direct cost to use them.  Indirect costs would be your time and effort  
to roll your own solution.

> what are teh downsides of Lucene as opposed to GSA. these are my  
> questions
> and if you can answerr them then it will be great help.

Downsides would be that it's a solution that you must support more  
internally with technical staff and you'd be missing admin UI to  
manage the thing.

Solr + SOLR-284, if you've got some technical staff with a bit of Java  
know-how, is the solution I'd personally opt for :)  (but I'm biased)

Note that there is also commercially available support for Lucene  
technologies, should you desire to have the best of both worlds...  
open source under the covers, and immediate access to technical  
support whenever you need it.  (again, I'm biased here being involved  
in such a company)

	Erik


Mime
View raw message