lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Em <mailformailingli...@yahoo.de>
Subject Re: Taxonomy in SOLR
Date Mon, 24 Jan 2011 17:45:35 GMT

Thank you for the advice, Erick!

I will take a look at extending the StandardRequestHandler for such
usecases.


Erick Erickson wrote:
> 
> I wasn't thinking about this for adding information to the *request*.
> Rather, in this
> case the autocomplete uses an Ajax call that just uses the TermsComponent
> to get the autocomplete data and display it. This is just textual, so
> adding
> it to the
> request is client-side magic.
> 
> If you want your app to have access to the meta-data for other purposes,
> you'd
> just query and cache it from the app. You could use that to build up the
> links
> you embed in the page for new queries if you chose, no custom handlers
> necessary.
> 
> Otherwise, I guess you'd create a custom request handler, that seems like
> a
> reasonable place.
> 
> Best
> Erick
> 
> On Mon, Jan 24, 2011 at 11:03 AM, Em <mailformailinglists@yahoo.de> wrote:
> 
>>
>> Hi Erick,
>>
>> in some usecases I really think that your suggestion with some
>> unique-documents for meta-information is a good approach to solve some
>> issues.
>> However there is a hurdle for me and maybe you can help me to clear it:
>>
>> What is the best way to get such meta-data?
>> I see three possible approaches:
>> 1st: get it in another request
>> 2nd: get it with a requestHandler
>> 3rd: get it with a searchComponent
>>
>> I think the 2nd and 3rd are the cleanest ways.
>> But to make a decision between them I run into two problems:
>> RequestHandler: Should I extend the StandardRequestHandler to do what I
>> need? If so, I could just query my index for the needed information and
>> add
>> it to the request before I pass it up the SearchComponents.
>>
>> SearchComponent: The problem with the SearchComponent is the distributed
>> thing and how to test it. However, if this would be the cleanest way to
>> go,
>> one should go it.
>>
>> What would you do, if you want to add some meta-information to your
>> request
>> that was not given by the user?
>>
>> Regards,
>> Em
>>
>>
>> Erick Erickson wrote:
>> >
>> > First, the redundancy is certainly there, but that's what Solr does,
>> > handles
>> > large
>> > amounts of data. 4 million documents is actually a pretty small corpus
>> by
>> > Solr
>> > standards, so you may well be able to do exactly what you propose with
>> > acceptable performance/size. I'd advise just trying it with, say,
>> 200,000
>> > docs.
>> > Why 200K? because index growth is non-linear with the first bunch of
>> > documents
>> > taking up more space than the second. So index 100K, examine your
>> indexes
>> > and index 100K more. Now use the delta to extrapolate to 4M.
>> >
>> > You don't need to store the taxonomy in each doc for auto-complete, you
>> > can
>> > get your auto-completion from a different index. Or you can index your
>> > taxonomies
>> > in a "special" document in Solr and query the (unique) field in that
>> > document for
>> > autocomplete.
>> >
>> > For faceting, you do need taxonomies. But remember that the nature of
>> the
>> > inverted index is that unique terms are only stored once, and the
>> document
>> > ID for each document that that term appears in is recorded. So if you
>> have
>> > 3/europe/germany/berlin stored in 1M documents, your index space is
>> really
>> > <string length + overhead> + <space for 1M ids>.
>> >
>> > Best
>> > Erick
>> >
>> > On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
>> > <dfontaine@rosebud.fr>wrote:
>> >
>> >> Yes, i am not obliged to store taxonomies.
>> >>
>> >> My taxonomies are type of
>> >>
>> >> english_taxon_label = Berlin
>> >> english_taxon_type = location
>> >> english_taxon_hierarchy = 0/world
>> >>                                              1/world/europe
>> >>                                              2/world/europe/germany
>> >>
>> >> 3/world/europe/germany/berlin
>> >>
>> >> I need *_taxon_hierarchy to faceting and label to auto complete.
>> >>
>> >> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
>> >> million documents the redundandcy is huge, no ?
>> >>
>> >> And i have 10 different taxonomies per document ....
>> >>
>> >> Damien
>> >>
>> >> Le 24/01/2011 10:30, Em a écrit :
>> >>
>> >>  Hi Damien,
>> >>>
>> >>> why are you storing the taxonomies?
>> >>> When it comes to faceting, it only depends on indexed values. If
>> there
>> >>> is
>> >>> a
>> >>> meaningful difference between the indexed and the stored value, I
>> would
>> >>> prefer to use an RDBMs or something like that to reduce redundancy.
>> >>>
>> >>> Does this help?
>> >>>
>> >>> Regards
>> >>>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2321340.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message