lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svein Parnas <sv...@trank.no>
Subject Re: SOLR X FAST
Date Wed, 12 Dec 2007 10:48:27 GMT

On Dec 12, 2007, at 2:50 AM, Nuno Leitao wrote:

>
> FAST uses two pipelines - an ingestion pipeline (for document  
> feeding) and a query pipeline which are fully programmable (i.e.,  
> you can customize it fully). At ingestion time you typically prepare  
> documents for indexing (tokenize, character normalize, lemmatize,  
> clean up text, perform entity extraction for facets, perform static  
> boosting for certain documents, etc.), while at query time you can  
> expand synonyms, and do other general query side tasks (not unlike  
> Solr).
>
> Horizontal scalability means the ability to cluster your search  
> engine across a large number of servers, so you can scale up on the  
> number of documents, queries, crawls, etc.
>
> There are FAST deployments out there which run on dozens, in some  
> cases hundreds of nodes serving multiple terabyte size indexes and  
> achieving hundreds of queries per seconds.
>
> Yet again, if your requirements are relatively simple then Lucene  
> might do the job just fine.
>
> Hope this helps.

With Fast, you will also get things like:
- categorization
- clustering
- more flexible collapsing / grouping
- more scalable facets (navigators) - at least for multivalued fields
- gigabytes of poorly documented software
- operations from hell
- huge amount of bugs
- high bills, both for software and hardware.

As for linguistic features (named entity extraction, dictionary based  
lemmatization and so on) and things like categorization / clustering  
etc, things should not be expected to work to well unless you put a  
huge amount of work into it, and some of the features are really  
primitive.

To sum up, if Solr meets your needs I would highly recommend Solr. If  
you need some additional features and have the knowledge, integrate  
other products with Solr. If you really need the scalability, go for  
Fast or some other commercial software.

As for document preprocessing and connectors for Solr, if you need it,  
you could have a look at OpenPipe, http://openpipe.berlios.de/ (not  
yet announced).

Svein


Mime
View raw message