incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: faceting
Date Fri, 28 Sep 2012 01:01:48 GMT
Changes to the Facet API to support this new feature.

Current API:

struct Facet {
  1:string queryStr,
  2:i64 minimumNumberOfBlurResults = 9223372036854775807
}

struct BlurQuery {
  ...
  3:list<Facet> facets,
  ...
}

struct BlurResults {
  ...
  4:list<i64> facetCounts,
  ...
}

Changed API:

enum FacetType {
  QUERY,
  TERM_ENUM
}

struct Facet {
  1:string queryStr,
  2:i64 minimumNumberOfBlurResults = 9223372036854775807,
  3:FacetType type = QUERY   //Facet type
}

struct BlurQuery {
  ...
  3:map<string,Facet> facets,   //Named facets
  ...
}

struct FacetResult {
  1:i64 count,                            //For the standard query facet
  2:map<string,i64> termCounts //For the term enum type
}

struct BlurResults {
  ...
  4:map<string,FacetResult> facetResults,  //Named results
  ...
}

Thoughts?


On Thu, Sep 27, 2012 at 2:39 PM, Garrett Barton <garrett.barton@gmail.com>wrote:

> I too want similar functionality.  The first thing I would like to see is a
> simple ordered list of all terms in a field with counts returned. This
> would be enabled I think through the analyzer definition at index creation
> time probably. Make someone conciously decide they want to take the
> calculation hit instead of putting the load on the shard servers.  Also
> isn't it faster right now to just execute aditional queries and use the hit
> counts than load up one with the facets?
> The second thing is not faceting directly I just happen to be using it with
> facets all the time.  I like to try and find the distinct values (and their
> counts) of a field for a given query for filtering. Right now I plow
> through some multiple of the results I return to try and get a mostly
> complete list of terms, this is obviously not the complete list.  Is there
> a way to get that list or make an API call to let me send that query to the
> shards?
>
> Thanks for listening!
> Garrett
>
> On Thursday, September 27, 2012, Aaron McCurry <amccurry@gmail.com> wrote:
> > Yep.  We can build it, but I think there needs to be some limits placed
> on
> > how many terms can be enumerated on.  I would hate to have someone pick
> an
> > primary key field to enumerate on and blow up the server.  I think that
> > easiest way to do it would be to expand the terms in the field on the
> shard
> > server and run the current faceting query on those expanded terms.  I
> think
> > that is the easy part.  The hard part is going to be how we modify the
> > facet api in thrift to accept the new facet type and how to return the
> > facet results.  How would you want the result api to look?
> >
> > Aaron
> >
> > On Thu, Sep 27, 2012 at 1:27 PM, Tim Williams <williamstw@gmail.com>
> wrote:
> >
> >> On Tue, Sep 18, 2012 at 10:42 AM, Aaron McCurry <amccurry@gmail.com>
> >> wrote:
> >> > In the BlurQuery object, add Facet objects to the facet list.  Where
> the
> >> > Facet object contains the query that you want to facet on for example:
> >> >
> >> > bq = new BlurQuery();
> >> > bq.addFacet(new Facet("tweets.text:hadoop", Long.MAX_VALUE); // where
> the
> >> > long is the minimum number results in the facet to return.
> >> > // So if the value was set to 10, the facet object would stop counting
> >> the
> >> > facet at 10.  Note: It's very likely that you will get more than your
> >> > minimum back.
> >> >
> >> > results = client.query("table",bq);
> >> > List<Long> counts = results.getFacetCounts();
> >> > long hadoopCount = counts.get(0); // The index of the results will
> match
> >> > the index of the facet object that where in the query.
> >> >
> >> > Hope this helps, let me know if you have anymore questions.
> >>
> >> Thanks it does.  I'm in need of the other kind of faceting, where a
> >> facet is essentially the distinct values for a field relative to a
> >> given query. Something like Solr's Enum-Based Field Faceting[1].  Any
> >> pointers for how I could implement that inside Blur?  The only thing I
> >> can come up with is outside blur and seems inefficient - essentially
> >> record distinct values for the fields of interest at ingest time; then
> >> use those values in Blur's existing facetquery to get the counts.  I'm
> >> guessing there's a better approach?
> >>
> >> Thanks,
> >> --tim
> >>
> >> [1] - http://wiki.apache.org/solr/SolrFacetingOverview
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message