incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Facets
Date Tue, 22 Oct 2013 00:40:47 GMT
On Mon, Oct 21, 2013 at 4:45 PM, Colton McInroy <colton@dosarrest.com>wrote:

> You have any suggestions on how I should deal with needing this type of
> information in the mean time?...
>
> Typically what I used facet data for was to generate graph data. Instead
> of having to go through every match, group it by time, count them up
> manually, etc, I would get facet data for timestamps. For instance, I
> create a query which says "field1:value" I would then have grabbed the
> facets for the Date field use the facet counts to plot a graph with
> timestamp/matches.
>
> I was thinking just go through all of the matches for now, which althrough
> probably is not nearly as efficient as going using lucene type facets,
> would get the trick done temporarily until proper facets are implemented.
>

Agreed, is the date field only a date?  Or does it contain timestamps as
well?  What is the range of the dates?  Days?  Weeks?  Months?  Years?  All
of the above?

The reason I ask is basically, if you are looking at let's say a months
worth and you have a time scope on the date field of days.  Then that's
only 30-31 facets that you will have to add manually to the query.
 Obviously as the time scope and range grows this will get a little too
messy to want to deal with on the client side.  Also you can use the terms
call to get the current terms in a field, so if you want to traverse the
indexed values that can give you that info.

Just trying to help get you want you need right now.


>
> Currently the blur site lists facets as being something that works here...
>
> http://incubator.apache.org/**blur/how_it_works.html<http://incubator.apache.org/blur/how_it_works.html>
>
> But as this thread kinda pointed out, facets the way faceted
> classification describes does not exist right now within apache blur. So
> someone may want to change that to inform that it is currently on the todo
> list or something.
>
> http://en.wikipedia.org/wiki/**Faceted_classification<http://en.wikipedia.org/wiki/Faceted_classification>
>
> A great example I use to show people what facets are is the following
> site...
>
> http://www.fasttech.com/**category/1499/consumer-**electronics<http://www.fasttech.com/category/1499/consumer-electronics>
>
> On the left side, it is easy to see a breakdown of all the different
> Fields/Values associated with the current search query. My intention is to
> display facet data for all (or the important ones anyway) of the fields
> associated with the current query along with a line graph showing the count
> of all matching rows for each time interval. Then the query can be refined
> more by querying a specific time range, or field.
>
> Is proper facet implementation something that is has a somewhat high
> priority and will hopefully be at least partially implemented within the
> next couple of weeks/months? Or should I just work on processing all the
> results myself for now? Also, I notice the default query matches is only
> 10, and I see no way to specify unlimited. Can I specify -1 for limited or
> something like that, or do I need to specify a really large number that
> will always be higher than the number of actual results I am expecting...
> like Long.MAX_VALUE or something?


I agree it is a priority, my top priority is getting 0.2.1 out the door.
 But if we can decide on the API changes that need to be made in the facet
apit we can begin on it in 0.3.0 at any point.  And once 0.2.1 is complete
I will be turning my focus on 0.3.0, I hope to call for a vote for 0.2.1 in
the next week.

Ok, so for queries you can page through the results.  However the facet
count reflect the entire answer.  You can't ask for all the results back at
once due to memory on constraints within the system.  But you can set in
the BlurQuery object the start and fetch (which is the number to fetch).

http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_BlurQuery


>
>
> Thanks,
> Colton McInroy
>
>  * Director of Security Engineering
>
>
> Phone
> (Toll Free)
> _US_    (888)-818-1344 Press 2
> _UK_    0-800-635-0551 Press 2
>
> My Extension    101
> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> Website         http://www.dosarrest.com
>
> On 10/18/2013 8:40 AM, Colton McInroy wrote:
>
>> Hello Aaron,
>>
>>     Yes, that's basically what I was thinking of for the facet results.
>> The current implementation doesn't really make any sense if your coming
>> from lucene. For simplicity and uniformity, I think it should be somewhat
>> like it is with lucene... with adaptation to the way blur is built... I
>> could kinda see something like this...
>>
>>     public static void queryBlur(String queryString, String table) {
>>         Iface client = BlurClient.getClient(**mainConfig.getString("**
>> controllers"));
>>         Query query = new Query();
>>         query.setQuery(queryString);
>>
>>         Selector selector = new Selector();
>>
>>         // This will fetch all the columns in family "fam0".
>>         selector.**addToColumnFamiliesToFetch("**event");
>>         selector.**addToColumnFamiliesToFetch("**msg");
>>
>>         BlurQuery blurQuery = new BlurQuery();
>>         int matches = 10;
>>         List<Facet> facets = Arrays.asList(new Facet("field1",
>> matches),new Facet("field2", matches));
>>         blurQuery.setFacets(facets);
>>         blurQuery.setFetch(50);
>>         blurQuery.setQuery(query);
>>         blurQuery.setSelector(**selector);
>>
>>         try {
>>             BlurResults results = client.query(table, blurQuery);
>>             for (Facet facet : result.getFacetResults()) {
>>                 System.out.println(facet.name+**" "+facet.value);
>>             }
>>         } catch (BlurException e) {
>>             // TODO Auto-generated catch block
>>             e.printStackTrace();
>>         } catch (TException e) {
>>             // TODO Auto-generated catch block
>>             e.printStackTrace();
>>         }
>>         return null;
>>     }
>>
>>     Just a brief modification from what I am doing now. Basically I just
>> envision a method called getFacetResults which returns List<Facet> with
>> each Facet object containing a "name" and a "value" which would be the
>> column name and facet count respectively. I'm just throwing this out there
>> for now. This is a different way of implementing the facets than lucene in
>> terms of how the code is accessed, but it would provide the same results.
>>
>>     It could also be done something like this...
>>
>>     List<Facet> facets = Arrays.asList(new Facet("field1"), new
>> Facet("field2"));
>>     blurQuery.setFacets(facets, matches);
>>
>>     Depends if the number of matches should be per facet or per query,
>> although I see the merits in being able to specify the matches for each
>> field.
>>
>> Thanks,
>> Colton McInroy
>>
>>  * Director of Security Engineering
>>
>>
>> Phone
>> (Toll Free)
>> _US_     (888)-818-1344 Press 2
>> _UK_     0-800-635-0551 Press 2
>>
>> My Extension     101
>> 24/7 Support     support@dosarrest.com <mailto:support@dosarrest.com>
>> Email     colton@dosarrest.com <mailto:colton@dosarrest.com>
>> Website     http://www.dosarrest.com
>>
>> On 10/18/2013 5:20 AM, Aaron McCurry wrote:
>>
>>> I have an issue in Jira to document facets in 0.2.1, it's not been worked
>>> yet but I hope I can get to it soon.  It looks like you figured out what
>>> is
>>> there.
>>>
>>> We will likely improve facets in 0.3.0 so the API will have to change a
>>> bit.  The biggest change we will need to make is the scenario that you
>>> bring up.  Facets in the current implementation case are simply other
>>> queries that can range from a single term to a complex query. I'm
>>> assuming
>>> that you would like to specify a field name and get something like a map
>>> of
>>> terms to counts for the given facet?
>>>
>>> The field facetCounts are counts that each of the facets in the input
>>> list
>>> from the query.  So the count list corresponds one for one to the facet
>>> list in the Query.  I realize this is less than ideal and we can going to
>>> be improving it soon.
>>>
>>> If you have some suggestions on how you would want the facet api to
>>> operate, new features, or anything else for that matter just write up
>>> your
>>> thoughts on this thread and we can incorporate them into the task.
>>>
>>> Thanks!
>>>
>>> Aaron
>>>
>>>
>>>
>>> On Fri, Oct 18, 2013 at 6:43 AM, Colton McInroy <colton@dosarrest.com
>>> >wrote:
>>>
>>>  Ok, so I created this method...
>>>>
>>>> public static BlurResults queryBlur(String queryString, String table) {
>>>>          Iface client = BlurClient.getClient(****
>>>> mainConfig.getString("**
>>>> controllers"));
>>>>          Query query = new Query();
>>>>          query.setQuery(queryString);
>>>>
>>>>          Selector selector = new Selector();
>>>>
>>>>          // This will fetch all the columns in family "fam0".
>>>>          selector.****addToColumnFamiliesToFetch("****event");
>>>>          selector.****addToColumnFamiliesToFetch("****msg");
>>>>
>>>>          BlurQuery blurQuery = new BlurQuery();
>>>>          List<Facet> facets = Arrays.asList(new Facet(queryString,
>>>> Long.MAX_VALUE));
>>>>          blurQuery.setFacets(facets);
>>>>          blurQuery.setFetch(50);
>>>>          blurQuery.setQuery(query);
>>>>          blurQuery.setSelector(****selector);
>>>>
>>>>          try {
>>>>              BlurResults results = client.query(table, blurQuery);
>>>>              return results;
>>>>          } catch (BlurException e) {
>>>>              // TODO Auto-generated catch block
>>>>              e.printStackTrace();
>>>>          } catch (TException e) {
>>>>              // TODO Auto-generated catch block
>>>>              e.printStackTrace();
>>>>          }
>>>>          return null;
>>>>      }
>>>>
>>>>  From reading through source code, I was able to find out that you
>>>> specify
>>>> facets as a list, but this is fairly confusing to me coming from lucene.
>>>>
>>>> In lucene when getting facet data, I specify the facet fields I am
>>>> interested in, and the facet results show me a top X list of values
>>>> within
>>>> that field. Whereas with blur, it appears that a facet is another query
>>>> which gives only a number as a result. When I tried to obtain the facet
>>>> data I am used to with Lucene, the only thing I could find was...
>>>>
>>>> System.out.println("Facet Results: "+results.getFacetCountsSize()****);
>>>> System.out.println(JSONArray.****toJSONString(results.****getFacetCounts()));
>>>>
>>>>
>>>> Could you please elaborate on this.
>>>>
>>>>
>>>> Thanks,
>>>> Colton McInroy
>>>>
>>>>   * Director of Security Engineering
>>>>
>>>>
>>>> Phone
>>>> (Toll Free)
>>>> _US_    (888)-818-1344 Press 2
>>>> _UK_    0-800-635-0551 Press 2
>>>>
>>>> My Extension    101
>>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>> Website         http://www.dosarrest.com
>>>>
>>>> On 10/18/2013 3:07 AM, Colton McInroy wrote:
>>>>
>>>>  I think I wrote this to soon, I believe I just found out how to do it.
>>>>> I'll test it out and supply some example code if correct to help
>>>>> others.
>>>>>
>>>>> Thanks,
>>>>> Colton McInroy
>>>>>
>>>>>   * Director of Security Engineering
>>>>>
>>>>>
>>>>> Phone
>>>>> (Toll Free)
>>>>> _US_     (888)-818-1344 Press 2
>>>>> _UK_     0-800-635-0551 Press 2
>>>>>
>>>>> My Extension     101
>>>>> 24/7 Support     support@dosarrest.com <mailto:support@dosarrest.com>
>>>>> Email     colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>> Website     http://www.dosarrest.com
>>>>>
>>>>> On 10/18/2013 2:58 AM, Colton McInroy wrote:
>>>>>
>>>>>  Hey Aaron,
>>>>>>
>>>>>>      You mentioned a while ago that blur handles facets as well and
>>>>>> that
>>>>>> you would provide an example. Unless I have missed that email, I
>>>>>> haven't
>>>>>> seen an example yet, could you provide one? I just took a quick look
>>>>>> myself
>>>>>> and could not figure it out. I see there is an example
>>>>>> FacetQueryTest.java
>>>>>> in blur-query but that appears to be basically just a copy of the
>>>>>> lucene
>>>>>> file.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message