lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: metadata about result sets?
Date Sat, 11 Mar 2006 06:13:35 GMT

: facted metadata. The way that we build the metatdata list is post-
: indexing of product, we would actual build a bit sequence that
: corresponds to all possible key/value combos for each product and
: associate each variation with the product. Then wehn someone refines

Hmmm... you can do something similar in solr by leveraging the Filter
caching, but you don't need to build a DocSet (bit sequence) for every
combination of values -- just one per value.  When you want to know what
the result is from combining multiple facets, you intersect the DocSets.

: For the schema, I just meant the document format. the file is called
: schema.xml. I haven't tried it, but it looks like you can change that
: to affect the way solr works without actually affecting the way
: lucene handles it. Is that wrong? I guess it doesn't really matter,

well, the schema.xml lets you define the fields, and what options you want
those fields to have -- but the set of options is fixed (and tied to the
set of options available in lucene).  You could have a "suggestable" field
in every doc which contained a list of field names -- but there's really
no way to annotate fields directly.

: As for 'scanning the resultset', I can see how I was a little shy on
: the details. Sorry about that. I meant look through the results to
: see what facets apply to the resultset. So if my company sells books
: and power tools, when someone searches for 'the little engine who
: could', once we know there are no power tools in the result set, I
: don't show the refinement facets for power tool metadata (like

Ah... right.  Depending on how you go about it, scanning result sets to
find out which facets apply is probably just as expensive as just testing
the facet to see if hte ocunt is posative -- especially if you have the
RAM to allocate a big Filter cache -- then you can have a DocSet for every
Filter in memory all of the time, and testing a few thousand can be done
near instantly.

:   In your example file, how does the name facet know to display only
: the names that start with whatever intial was selected?   Would that

Honestly, i haven't really thought about the mechanisms for dealing with
facets in that way ... it would be trivial just to let all of the names be
tested -- but that would obviously involve a lot of unneccessary
computation, so if you could configure subgroups that were only consulted
once another facet had been used, that would obviously make more sense.
Perhaps something like...

      <group id="initial" label="Author">
        <facet id="a" label="A" query="author:a*">
          <group id="name" label="Author">
            <facet use-prefix-terms-field="author">a</facet>
        <facet id="b" label="B" query="author:b*">
          <group id="name" label="Author">
            <facet use-prefix-terms-field="author">b</facet>

..allthough obviously common group/subgroup relationships like prefix
expansion could be done as a special case to make it shorter to express...

      <!-- 'mfg' facet starts with initial,
            then full names that match initial -->
      <group id="mfg" label="Manufacturer">
        <prefix-facet id="initial" field="mfg" prefix-chars="1">
          <term-value-facet id="mfg" />
      <!-- 'name' facet starts with initial, then 2 char prefix,
            then full names that match prefix -->
      <group id="name" label="Author">
        <prefix-facet id="initial" field="name" prefix-chars="1">
          <prefix-facet id="i_prefix" field="name" prefix-chars="2">
            <term-value-facet id="name" />

: I think what I am starting to understand is that coming from what we
: have (a rdbms based metadata gathering system), I need to rethink my
: process. Ive spent so much time training myself to think in terms of
: how to make things fast in mysql that I need to re-open my mind :)

yeah ... inverted indexes are a completey different beast from relational
databases.  i would definitely suggest reading up on lucene, and getting
to know the basics -- keeping in mind that anything that can be done in
lucene can be done in a solr plugin, solr just makes it easier and gives
you "wicked cool" caching :)


View raw message