lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Poh <d...@globalsources.com>
Subject Re: Does solr 4.8.1 support these features?
Date Tue, 10 Jun 2014 07:26:01 GMT
Hi Mark

Appreciate you taking the time to reply and with references.

Regarding 3. Configure and defined the relevance ranking and matching 
logic of the return result.

Can each search handler be configure to
- search on a few fields
- assign a numeric rank to each of the field, such that a match on a 
field with the highest rank will rank the document higher in the return 
search result.
- the ranking of each field will also act as tie-breaker.
Eg.
Category = 3
SPPKeyWord= 2
KeySpecification= 1

Document that has match on field Category will be ranked higher in the 
result than document that has match on SPPKeyWord.
Document that has match only on field KeySpecification willrank the 
lowest in the result.


On 6/10/2014 12:27 AM, Mark Bennett wrote:
> Hello Derek,
>
> See answers inline.
>
> --
> Mark Bennett / LucidWorks: Search & Big Data / mark.bennett@lucidworks.com
> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
>
> On Jun 9, 2014, at 12:00 AM, Derek Poh <dpoh@globalsources.com> wrote:
>
>> My company is actively looking at alternative search engine applications to replace
our current Endeca application.
>>
>> I have no experience and knowledge on Solr and Lucene.
>> Please bear with me, I would like to find out if the following features are available
on Solr.
>>
>> 1. Aggregate results (rollups).
>> Eg. Froma list of search result of products (each has field = supplier id), can the
results be aggregated by supplier id with the original results ordering retain.
> Yes it can:
> http://wiki.apache.org/solr/FieldCollapsing
>
>> 2. Filter/Navigator, counts.
>> List out a field's possible values and their counts fromthe indexed data and from
the return results.
>> The field's values can be sorted by the values description or by the values countsin
the return results.
> Yes, Solr calls these "Facets" and offers several types:
> http://wiki.apache.org/solr/SimpleFacetParameters
> http://wiki.apache.org/solr/HierarchicalFaceting
>
>> Eg. Field "Business Type" belowwith it's possible values and the count for each value(in
bracket). Can the field be return in the result with it's values sorted either by description
or bycounts?
>> Business Type
>> Manufacturer (15269)
>>     Exporter (12493)
>>     Trading Company (5541)
>>     Agent (1324)
>>     Wholesaler (1202)
>>     Importer (682)
>>     Buying Office (394)
>> Distributor (278)
>>     Other (157)
>>     Retailer (116)
>>     Consultant (54)
> Absolutely, and Solr is very fast and accurate.
>
>> 3. Configureand defined the relevance rankingand matching logic of the return result.
> Yes, though not by that name.
> Step 1:
> Configure default edismax parameters in your solrconfig.xml
>
> Step 2:
> Create additional search handlers in solrconfig.xml, and each search handler can have
its own edismax configuration.
>
> Normally the format of the search URL is:
>      http://localhost:8983/solr/collection_name/select?q=text:budget
>
> You would replace the "select" with the name of the search handler that has the edismax
config you want.
>
> With multiple search handlers, you'd use something like:
>      http://localhost:8983/solr/collection_name/search_freshest?q=text:budget
>      http://localhost:8983/solr/collection_name/search_most_popular?q=text:budget
>
>> 4. Defined and configure the thesaurus (1-wayor 2-way), stemming and stop words.
> Yes, Solr is very good about this, you have both options.
>
> Also, Solr let's you choose:
> * Index time, or query time, or both
> * Use expansion or reduction
>
> You can even have more than one thesaurus file and have them each handled differently.
>
> For example:
> * Use an english_language thesaurus, which rarely changes, and expand that at index time
> * Use your company_synonyms, which may change frequently, and expand them at search time.
>
> I'll let you find these in the wiki, http://wiki.apache.org
>
>> 5. Multi-language supportfor Simplified Chinese and Spanish.
> Yes!
>
> And for simplified Chinese, please make sure to use the SmartCN analyzer, and not the
simplistic "CJK"; SmartCN actually looks for Chinese language word breaks using statistical
methods, and therefore should give better results.
>
>> 6. Scalability.
>> At present, we are indexing 4million recordsand the number is expected to increase
by more than 10 folds in the near future.
> 40 million documents can normally be handled on a single machine, assuming it has enough
RAM and doesn't have a lot of other stuff running.
> You might want a second machine for failover.
>
> When people use multiple machines, then the way to do that is via SolrCloud.
>
>> 7. Search results debugging. Eg. why record was matchedor why record was ranked as
such.
> Yes.
>
> You typically add &debugQuery=true&debug.explain.structured=true to the URL.
>
> The output is a bit technical, it takes some practice to understand.
>
> There's also a graphical relevancy debugger with a free eval period:
> http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/
>
>> Derek
>>
>> ----------------------
>> CONFIDENTIALITY NOTICE
>> This e-mail (including any attachments) may contain confidential and/or privileged
information. If you are not the intended recipient or have received this e-mail in error,
please inform the sender immediately and delete this e-mail (including any attachments) from
your computer, and you must not use, disclose to anyone else or copy this e-mail (including
any attachments), whether in whole or in part.
>> This e-mail and any reply to it may be monitored for security, legal, regulatory
compliance and/or other appropriate reasons.
>
>


----------------------
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information.
If you are not the intended recipient or have received this e-mail in error, please inform
the sender immediately and delete this e-mail (including any attachments) from your computer,
and you must not use, disclose to anyone else or copy this e-mail (including any attachments),
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance
and/or other appropriate reasons.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message