lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: solr-suggestion - terms that "start with"...
Date Sat, 20 May 2006 21:55:35 GMT

On May 19, 2006, at 4:53 PM, Chris Hostetter wrote:
> : it has is case sensitivity.  I could lowercase everything, but then
> : the terms the user sees will be in all lowercase and that simply
> : won't do for my scholarly audience :)
> picky, picky users.

Yeah, if it wasn't for them, I'd have it easy :)

> : It seems like what I really need is simply a separate index (or
> : rather a partition of the main Solr one) where a Document represents
> : an "agent", and do a PrefixQuery or TermEnum and get all unique  
> agents.
> i've let it roll arround in my head for a few days, and i think that's
> exactly what i would do if it were me ... in fact, what you  
> describe is
> pretty much exactly what i do for product categories, except htat i  
> think
> i store more metadata about each category then you would store  
> about your
> "agents".  what you really need is a way to search for agents by  
> term or
> term prefix, get a list of matching agents, and then use each agent  
> as a
> facet for your "works" .. I do the same thing, except my term  
> queries are
> on the unique id for the category, and my "prefix" queries are for the
> null prefix (ie: look at all categories) .. then once i have a  
> category, i
> have other data that helps me with further facets (ie: for digital
> camera's, "resolution" is a good facet).

In fact, I've implemented this locally as a custom SolrCache that  
holds a RAMDirectory.  I TermEnum the agents in the main index on warm 
() and index all the agents into the RAMDirectory.  It is working well.

> i could imagine the same extension eventually unfolding for your  
> agents
> ... i don't know much about literary works, but if we transition it  
> to art
> in general, you might have information for one artist about different
> "labels" that apply to the art he produced in his life (sculpture,
> painting, cubist, impressionist, modern, "blue period", etc..) and  
> once
> your user has selected a specific artist, you could use the list of  
> labels
> from a stored field of the artists metadata doc to decide which  
> facets to
> offer the user in refining further.

We have metadata out the wazoo for this stuff.  We have "genres"  
which is a categorization of the type of work like "Painting",  
"Poetry", etc.  We have agents classified into roles.  The same  
person could be the author of one work, and a figure in a painting of  
another work, and the editor of another.  So even within agent the  
user interface will display the break down of each agent by the  
various roles.  *whew*

> : Maybe I need to build some sort of term -> agent cache during  
> warming
> : that makes this a no brainer?
> that's another way to go ... but if you make one doc per agent,  
> then this
> is just a subset of the filter cache .... i personally love the filter
> cache :)

I opted for the RAMDirectory so I can leverage Lucene scoring for the  
ordering of agents, rather than only alphabetical and frequency options.


View raw message