incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colton McInroy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-296) Facets are subqueries not facets
Date Fri, 29 Nov 2013 20:04:36 GMT

    [ https://issues.apache.org/jira/browse/BLUR-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835526#comment-13835526
] 

Colton McInroy commented on BLUR-296:
-------------------------------------

Ok, sorry for the delayed response on this, I have been rather busy.

I originally posted this code on the mailing list, and I think it would help explain a way
to access a proper facet implementation...

   public static void queryBlur(String queryString, String table) {
        Iface client = BlurClient.getClient(mainConfig.getString("controllers"));
        Query query = new Query();
        query.setQuery(queryString);

        Selector selector = new Selector();

        // This will fetch all the columns in family "fam0".
        selector.addToColumnFamiliesToFetch("event");
        selector.addToColumnFamiliesToFetch("msg");

        BlurQuery blurQuery = new BlurQuery();
        int matches = 10;
        List<Facet> facets = Arrays.asList(new Facet("field1", matches),new Facet("field2",
matches));
        blurQuery.setFacets(facets);
        blurQuery.setFetch(50);
        blurQuery.setQuery(query);
        blurQuery.setSelector(selector);

        try {
            BlurResults results = client.query(table, blurQuery);
            for (Facet facet : result.getFacetResults()) {
                System.out.println(facet.name+" "+facet.value+" "+facet.count);
            }
        } catch (BlurException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (TException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return null;
    } 

To elaborate a bit, instead of facets being a sub query returning a count, the facet list
contains a number of column fields and the number of matches to return for that field. Or
it could be done this way also...

    List<Facet> facets = Arrays.asList(new Facet("field1"), new Facet("field2"));
    blurQuery.setFacets(facets, matches); 

Instead of specifying the number of facets for each field, this would specify the number of
matches for all facets. Perhaps a combination of the two would be ideal.

On return, facets should return a list containing the facet results, which should be something
like the following for the above println statement...
field1 value1 4
field1 value2 2
field1 value3 1
field2 value1 4
field2 value2 2
field2 value3 1

This would be fairly similar to the facet implementation in lucene, as well as specifications
for what facets are according to online definitions.

Now, for things that need to be done to accomplish this, I am not entirely sure. When I build
my implementation I created a sub directory under each index that contained a facet index.
Facet data in my experience with lucene are stored in separate indexes. So, with my understanding
in blur, I believe another index would need to be created along with each shard. With my still
limited knowledge of blur, I am guessing that the following would need to be implemented.

- Some kind of flag needs to be associated with each table for if it does facet indexing (perhaps
something in the create process)
- Code that handles column declarations needs to see if facet indexing is enabled for a shard
and when a column is declared, start collecting facet data for mutates.
- Controller/Shard servers need to support collecting facet data along with queries if query
and table request/support facet queries.
- Controller servers need to handle aggregating data from shard servers into final query response.
- API for executing queries needs to be able to support new facet system.

> Facets are subqueries not facets
> --------------------------------
>
>                 Key: BLUR-296
>                 URL: https://issues.apache.org/jira/browse/BLUR-296
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: experimental-dev, 0.2.0, 0.3.0, 0.2.1
>         Environment: N/A
>            Reporter: Colton McInroy
>              Labels: features
>
> Based on the classification of what Facets are from Lucene and other search systems,
the current implementation in Blur does not really support this functionality.
> http://en.wikipedia.org/wiki/Faceted_classification 
> http://en.wikipedia.org/wiki/Faceted_search
> It is entirely different than anything really described in the Lucene documentation.
> http://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message