incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Facets
Date Wed, 06 Nov 2013 20:41:56 GMT
I don't think that I have created it.  I will look in the issues list.
Feel free to create it if it's missing.

Aaron


On Wed, Nov 6, 2013 at 3:39 PM, Colton McInroy <colton@dosarrest.com> wrote:

> Did you create the jira issue for this? I didn't see a notification for it
> being sent into the mailing list.
>
> Now that 0.2.1 is out, is proper facet implementation going to get worked
> on now? This is extremely important for our implementation of Blur, I have
> been reading through the code, but I see that a LOT of changes are going to
> have to occur for it to be functional. I'm sure I could convince our
> company to donate monetarily if it will help speed things up, I am also
> able to spend my own time helping work on the code changes.
>
>
> Thanks,
> Colton McInroy
>
>  * Director of Security Engineering
>
>
> Phone
> (Toll Free)
> _US_    (888)-818-1344 Press 2
> _UK_    0-800-635-0551 Press 2
>
> My Extension    101
> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> Website         http://www.dosarrest.com
>
> On 10/25/2013 1:26 PM, Aaron McCurry wrote:
>
>> Colton,
>>
>> Yes I think that is exactly what you are describing.  I will create the
>> inital jira issue and either copy the content you have created or link to
>> it and we will continue discussing implementation there.  Thanks!
>>
>> Aaron
>>
>>
>> On Fri, Oct 25, 2013 at 3:39 PM, Colton McInroy <colton@dosarrest.com
>> >wrote:
>>
>>  Umm... isn't that what I did? I mentioned it a few times, supplied a link
>>> to the lucene documentation, etc.
>>>
>>>
>>> Thanks,
>>> Colton McInroy
>>>
>>>   * Director of Security Engineering
>>>
>>>
>>> Phone
>>> (Toll Free)
>>> _US_    (888)-818-1344 Press 2
>>> _UK_    0-800-635-0551 Press 2
>>>
>>> My Extension    101
>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>> Website         http://www.dosarrest.com
>>>
>>> On 10/25/2013 12:20 PM, Otis Gospodnetic wrote:
>>>
>>>  I only skimmed this thread. Nobody seems to have mentioned Lucene's own
>>>> faceting, which merits looking into.
>>>>
>>>> Otis
>>>> Solr & ElasticSearch Support
>>>> http://sematext.com/
>>>> On Oct 22, 2013 1:56 AM, "Colton McInroy" <colton@dosarrest.com> wrote:
>>>>
>>>>   Thanks,
>>>>
>>>>> Colton McInroy
>>>>>
>>>>>    * Director of Security Engineering
>>>>>
>>>>>
>>>>> Phone
>>>>> (Toll Free)
>>>>> _US_    (888)-818-1344 Press 2
>>>>> _UK_    0-800-635-0551 Press 2
>>>>>
>>>>> My Extension    101
>>>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>> Website         http://www.dosarrest.com
>>>>>
>>>>> On 10/21/2013 5:40 PM, Aaron McCurry wrote:
>>>>>
>>>>>   On Mon, Oct 21, 2013 at 4:45 PM, Colton McInroy <
>>>>> colton@dosarrest.com
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>     You have any suggestions on how I should deal with needing
this
>>>>>> type
>>>>>> of
>>>>>>
>>>>>>  information in the mean time?...
>>>>>>>
>>>>>>> Typically what I used facet data for was to generate graph data.
>>>>>>> Instead
>>>>>>> of having to go through every match, group it by time, count
them up
>>>>>>> manually, etc, I would get facet data for timestamps. For instance,
I
>>>>>>> create a query which says "field1:value" I would then have grabbed
>>>>>>> the
>>>>>>> facets for the Date field use the facet counts to plot a graph
with
>>>>>>> timestamp/matches.
>>>>>>>
>>>>>>> I was thinking just go through all of the matches for now, which
>>>>>>> althrough
>>>>>>> probably is not nearly as efficient as going using lucene type
>>>>>>> facets,
>>>>>>> would get the trick done temporarily until proper facets are
>>>>>>> implemented.
>>>>>>>
>>>>>>>    Agreed, is the date field only a date?  Or does it contain
>>>>>>> timestamps as
>>>>>>>
>>>>>>>  well?  What is the range of the dates?  Days?  Weeks?  Months?
>>>>>>  Years?
>>>>>>    All
>>>>>> of the above?
>>>>>>
>>>>>>   To the second... YYYYMMDDHHmmss
>>>>>>
>>>>>   The reason I ask is basically, if you are looking at let's say a
>>>>> months
>>>>>
>>>>>> worth and you have a time scope on the date field of days.  Then
>>>>>> that's
>>>>>> only 30-31 facets that you will have to add manually to the query.
>>>>>>     Obviously as the time scope and range grows this will get a little
>>>>>> too
>>>>>> messy to want to deal with on the client side.  Also you can use
the
>>>>>> terms
>>>>>> call to get the current terms in a field, so if you want to traverse
>>>>>> the
>>>>>> indexed values that can give you that info.
>>>>>>
>>>>>>   Depends upon the timescale being queried. If the timescale is the
>>>>>> past
>>>>>>
>>>>> hour, then it would be by minute, if it's over a month, then it would
>>>>> be
>>>>> by
>>>>> hour. For lucene, I just get the facets, and post process them by
>>>>> shrinking
>>>>> the timestamp value down the the level I want.... Such as if I wanted
>>>>> to
>>>>> view hourly counts, I would loop through all of the facet results
>>>>> condensing them down to minute values. Postprocessing the facet results
>>>>> from lucene facets was by far a LOT quicker than going through all of
>>>>> the
>>>>> actual results, which I am betting is probly the case with blur as
>>>>> well.
>>>>> With lucene, facets was what I used the most when trying to present
>>>>> information to GUI interfaces because it makes the most sense when
>>>>> viewing
>>>>> for people.
>>>>>
>>>>>   Just trying to help get you want you need right now.
>>>>>
>>>>>>
>>>>>>    Currently the blur site lists facets as being something that works
>>>>>>
>>>>>>  here...
>>>>>>>
>>>>>>> http://incubator.apache.org/******blur/how_it_works.html<htt
>>>>>>> p://incubator.apache.org/****blur/how_it_works.html>
>>>>>>> <http:**//incubator.apache.org/**blur/**how_it_works.html<
>>>>>>> http://incubator.apache.org/**blur/how_it_works.html>
>>>>>>> <http://**incubator.apache.**org/blur/how_**it_works.html<h
>>>>>>> ttp://incubator.apache.org/blur/how_**it_works.html>
>>>>>>> <h**ttp://incubator.apache.org/**blur/how_it_works.html<h
>>>>>>> ttp://incubator.apache.org/blur/how_it_works.html>
>>>>>>>
>>>>>>> But as this thread kinda pointed out, facets the way faceted
>>>>>>> classification describes does not exist right now within apache
blur.
>>>>>>> So
>>>>>>> someone may want to change that to inform that it is currently
on the
>>>>>>> todo
>>>>>>> list or something.
>>>>>>>
>>>>>>> http://en.wikipedia.org/wiki/******Faceted_classification<ht
>>>>>>> tp://en.wikipedia.org/wiki/****Faceted_classification>
>>>>>>> <http**://en.wikipedia.org/wiki/****Faceted_classification<
>>>>>>> http://en.wikipedia.org/wiki/**Faceted_classification>
>>>>>>> <http:/**/en.wikipedia.org/**wiki/**Faceted_classification<
>>>>>>> http://en.wikipedia.org/wiki/**Faceted_classification>
>>>>>>> <**http://en.wikipedia.org/wiki/**Faceted_classification<
>>>>>>> http://en.wikipedia.org/wiki/Faceted_classification>
>>>>>>>
>>>>>>> A great example I use to show people what facets are is the following
>>>>>>> site...
>>>>>>>
>>>>>>> http://www.fasttech.com/******category/1499/consumer-******
>>>>>>> electronics<http://www.fasttech.com/****category/
>>>>>>> 1499/consumer-****electronics>
>>>>>>> <http://www.**fasttech.com/**category/1499/**consumer-**electronics<
>>>>>>> http://www.fasttech.com/**category/1499/consumer-**electronics>
>>>>>>> <http://www.**fasttech.com/**category/1499/**consumer-**electronics<
>>>>>>> http://fasttech.com/category/1499/**consumer-electronics>
>>>>>>>
>>>>>>> <http://www.**fasttech.com/category/1499/**consumer-electronics<
>>>>>>> http://www.fasttech.com/category/1499/consumer-electronics>
>>>>>>> On the left side, it is easy to see a breakdown of all the different
>>>>>>> Fields/Values associated with the current search query. My intention
>>>>>>> is
>>>>>>> to
>>>>>>> display facet data for all (or the important ones anyway) of
the
>>>>>>> fields
>>>>>>> associated with the current query along with a line graph showing
the
>>>>>>> count
>>>>>>> of all matching rows for each time interval. Then the query can
be
>>>>>>> refined
>>>>>>> more by querying a specific time range, or field.
>>>>>>>
>>>>>>> Is proper facet implementation something that is has a somewhat
high
>>>>>>> priority and will hopefully be at least partially implemented
within
>>>>>>> the
>>>>>>> next couple of weeks/months? Or should I just work on processing
all
>>>>>>> the
>>>>>>> results myself for now? Also, I notice the default query matches
is
>>>>>>> only
>>>>>>> 10, and I see no way to specify unlimited. Can I specify -1 for
>>>>>>> limited
>>>>>>> or
>>>>>>> something like that, or do I need to specify a really large number
>>>>>>> that
>>>>>>> will always be higher than the number of actual results I am
>>>>>>> expecting...
>>>>>>> like Long.MAX_VALUE or something?
>>>>>>>
>>>>>>>   I agree it is a priority, my top priority is getting 0.2.1
out the
>>>>>>>
>>>>>> door.
>>>>>>     But if we can decide on the API changes that need to be made
in
>>>>>> the
>>>>>> facet
>>>>>> apit we can begin on it in 0.3.0 at any point.  And once 0.2.1 is
>>>>>> complete
>>>>>> I will be turning my focus on 0.3.0, I hope to call for a vote for
>>>>>> 0.2.1
>>>>>> in
>>>>>> the next week.
>>>>>>
>>>>>> Ok, so for queries you can page through the results.  However the
>>>>>> facet
>>>>>> count reflect the entire answer.  You can't ask for all the results
>>>>>> back
>>>>>> at
>>>>>> once due to memory on constraints within the system.  But you can
set
>>>>>> in
>>>>>> the BlurQuery object the start and fetch (which is the number to
>>>>>> fetch).
>>>>>>
>>>>>> http://incubator.apache.org/****blur/docs/0.2.0/Blur.html#**<
>>>>>> http://incubator.apache.org/**blur/docs/0.2.0/Blur.html#**>
>>>>>> Struct_BlurQuery<http://**incubator.apache.org/blur/**
>>>>>> docs/0.2.0/Blur.html#Struct_**BlurQuery<http://incubator.
>>>>>> apache.org/blur/docs/0.2.0/Blur.html#Struct_BlurQuery>
>>>>>>
>>>>>>   Hmm... yea, when going through say 100,000,000+ rows to generate
a
>>>>>>
>>>>> graph,
>>>>> it is no doubt going to take a long time though re-querying in 1,000
>>>>> results intervals 100,000+ times. If that's for only 5 minutes of data,
>>>>> it's a huge amount of processing to see general statistics of the data
>>>>> you
>>>>> have in front of you.
>>>>>
>>>>> This is where facets became vital for me. I understand that right now
>>>>> "facets" in blur are not really facets, they are instead additional
>>>>> queries
>>>>> which get run. Not really sure why it was implemented that way, but
>>>>> when
>>>>> you read the lucene documentation (http://lucene.apache.org/****
>>>>> core/4_3_0/ <http://lucene.apache.org/**core/4_3_0/><http://lucene.**
>>>>> apache.org/core/4_3_0/ <http://lucene.apache.org/core/4_3_0/>>)
>>>>>
>>>>> it links to wiki pages about faceted searches as well as a use guide
>>>>> explaining what facets are, the implementation in blur does not match
>>>>> what
>>>>> everything else defines facets as.
>>>>>
>>>>> I'm not sure who or how facets became to be implemented in the current
>>>>> manor, but it does not make sense at all or comply with all definitions
>>>>> of
>>>>> facets I have found. I find this to be a conflict, if blur advertises
>>>>> them
>>>>> but does not really have them. Since there is no documentation about
>>>>> facets
>>>>> really, other than it saying it's in the feature list, it took me a
>>>>> while
>>>>> to discover this. For me in particular, this is vital. What use is
>>>>> indexing
>>>>> massive amounts of information if you do not have very good visibility
>>>>> of
>>>>> it.
>>>>>
>>>>> As I have mentioned, my use is for storing logged events. Let's say you
>>>>> have events for sshd being stored in a table along with the fields
>>>>> Date,
>>>>> LoginMethod, IP, User, Server, and Success. If you have a LOT servers
>>>>> being
>>>>> monitored which have a lot of user login activity. In lucene I would
>>>>> do a
>>>>> single query against any of those fields, or perhaps just start with
>>>>> matching all records. Along with that query, I would get the facets for
>>>>> those fields using Date to display a time graph of activity for the
>>>>> rows. I
>>>>> would then display the top 5-10 facets for each field along with a
>>>>> subquery
>>>>> that does just a facetquery to display another time graph of the Date
>>>>> facets. With this you can instantly see 10 login failures within
>>>>> 100,000
>>>>> successes, how many times each user has logged in and what methods
>>>>> where
>>>>> used, etc. This is a simple example, but expand that out to all kinds
>>>>> of
>>>>> other information and it's night and day visibility of data.
>>>>>
>>>>> When trying to view data of any kind in an effective manor, graphing
>>>>> always helps, but to process every matching row is obviously
>>>>> inefficient. I
>>>>> believe some of the other systems out there such as splunk do that, but
>>>>> when I did my own work, I found that to slow and inefficient. Sure, it
>>>>> works fine when viewing a small amount of data, but when we are talking
>>>>> about big data, which is what Blur is designed for, and what I am
>>>>> working
>>>>> with, it's just to much overhead. Using facets on date values to
>>>>> produce
>>>>> time graphs of entries no matter how many rows/records you produce
>>>>> pretty
>>>>> much is almost instant.
>>>>>
>>>>> In splunk or other search systems, I would see events populated over
>>>>> time
>>>>> in a graph along with the first page of data. The time graph continues
>>>>> to
>>>>> fill over time showing a timeline of data. Depending on your data, this
>>>>> can
>>>>> take a seriously long time. This is no doubt doing what your suggesting
>>>>> with the processing of data one page at a time, sending it to the
>>>>> browser
>>>>> to parse into data stores that display graphs.
>>>>> With facet results, I was able to display the historical timelines in
>>>>> the
>>>>> same amount of time it took to do a single query along with the facet
>>>>> data.
>>>>> There just is no match from what I have seen so far, for Lucene indexes
>>>>> along with facet indexes, which is what got my so excited about blur.
I
>>>>> myself literally was in the design phase of writing my own
>>>>> implementation
>>>>> of a distributed lucene index system when I decided to stop and check
>>>>> what
>>>>> was out there before re-inventing the wheel. When I came across the
>>>>> blur
>>>>> project, I found the feature list and looked at two things primarily
>>>>> which
>>>>> got me into starting to work with the project. Those two things were
>>>>> "Fast
>>>>> data ingestion" and "Facets". So far, data seems to be getting pretty
>>>>> quickly in my virtual box tests, which is good. I am going to be
>>>>> scaling
>>>>> up
>>>>> soon once the new hardware requisition is finished. Facets though is
>>>>> currently stopping me from moving forward on some of the code
>>>>> development
>>>>> which requires facets, which is why I am so interested in it's
>>>>> implementation. With looping through records, it could take minutes to
>>>>> get
>>>>> proper visibility of data, whereas with Facets only a couple seconds
if
>>>>> that.
>>>>>
>>>>> While waiting, I am going to probably make that IP field type
>>>>> definition
>>>>> I
>>>>> mentioned earlier, as possibly some additional ones. Most of the code
>>>>> for
>>>>> that seems to make sense, but I'll need to load it up in something
>>>>> other
>>>>> than a text editor to really get an appreciation for it. If some of
>>>>> what
>>>>> needs to be done for facets can be explained, I'll perhaps see if I can
>>>>> dedicate some company time to it.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>> Colton McInroy
>>>>>>>
>>>>>>>     * Director of Security Engineering
>>>>>>>
>>>>>>>
>>>>>>> Phone
>>>>>>> (Toll Free)
>>>>>>> _US_    (888)-818-1344 Press 2
>>>>>>> _UK_    0-800-635-0551 Press 2
>>>>>>>
>>>>>>> My Extension    101
>>>>>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>>>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>>>> Website         http://www.dosarrest.com
>>>>>>>
>>>>>>> On 10/18/2013 8:40 AM, Colton McInroy wrote:
>>>>>>>
>>>>>>>    Hello Aaron,
>>>>>>>
>>>>>>>         Yes, that's basically what I was thinking of for the
facet
>>>>>>>> results.
>>>>>>>> The current implementation doesn't really make any sense
if your
>>>>>>>> coming
>>>>>>>> from lucene. For simplicity and uniformity, I think it should
be
>>>>>>>> somewhat
>>>>>>>> like it is with lucene... with adaptation to the way blur
is
>>>>>>>> built...
>>>>>>>> I
>>>>>>>> could kinda see something like this...
>>>>>>>>
>>>>>>>>        public static void queryBlur(String queryString, String
>>>>>>>> table) {
>>>>>>>>            Iface client = BlurClient.getClient(****
>>>>>>>> mainConfig.getString("**
>>>>>>>> controllers"));
>>>>>>>>            Query query = new Query();
>>>>>>>>            query.setQuery(queryString);
>>>>>>>>
>>>>>>>>            Selector selector = new Selector();
>>>>>>>>
>>>>>>>>            // This will fetch all the columns in family "fam0".
>>>>>>>>            selector.******addToColumnFamiliesToFetch("**
>>>>>>>> ****event");
>>>>>>>>            selector.******addToColumnFamiliesToFetch("******msg");
>>>>>>>>
>>>>>>>>            BlurQuery blurQuery = new BlurQuery();
>>>>>>>>            int matches = 10;
>>>>>>>>            List<Facet> facets = Arrays.asList(new Facet("field1",
>>>>>>>> matches),new Facet("field2", matches));
>>>>>>>>            blurQuery.setFacets(facets);
>>>>>>>>            blurQuery.setFetch(50);
>>>>>>>>            blurQuery.setQuery(query);
>>>>>>>>            blurQuery.setSelector(******selector);
>>>>>>>>
>>>>>>>>            try {
>>>>>>>>                BlurResults results = client.query(table,
blurQuery);
>>>>>>>>                for (Facet facet : result.getFacetResults())
{
>>>>>>>>                    System.out.println(facet.name+******"
>>>>>>>>
>>>>>>>> "+facet.value);
>>>>>>>>                }
>>>>>>>>            } catch (BlurException e) {
>>>>>>>>                // TODO Auto-generated catch block
>>>>>>>>                e.printStackTrace();
>>>>>>>>            } catch (TException e) {
>>>>>>>>                // TODO Auto-generated catch block
>>>>>>>>                e.printStackTrace();
>>>>>>>>            }
>>>>>>>>            return null;
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        Just a brief modification from what I am doing now.
>>>>>>>> Basically I
>>>>>>>> just
>>>>>>>> envision a method called getFacetResults which returns List<Facet>
>>>>>>>> with
>>>>>>>> each Facet object containing a "name" and a "value" which
would be
>>>>>>>> the
>>>>>>>> column name and facet count respectively. I'm just throwing
this out
>>>>>>>> there
>>>>>>>> for now. This is a different way of implementing the facets
than
>>>>>>>> lucene
>>>>>>>> in
>>>>>>>> terms of how the code is accessed, but it would provide the
same
>>>>>>>> results.
>>>>>>>>
>>>>>>>>        It could also be done something like this...
>>>>>>>>
>>>>>>>>        List<Facet> facets = Arrays.asList(new Facet("field1"),
new
>>>>>>>> Facet("field2"));
>>>>>>>>        blurQuery.setFacets(facets, matches);
>>>>>>>>
>>>>>>>>        Depends if the number of matches should be per facet
or per
>>>>>>>> query,
>>>>>>>> although I see the merits in being able to specify the matches
for
>>>>>>>> each
>>>>>>>> field.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Colton McInroy
>>>>>>>>
>>>>>>>>     * Director of Security Engineering
>>>>>>>>
>>>>>>>>
>>>>>>>> Phone
>>>>>>>> (Toll Free)
>>>>>>>> _US_     (888)-818-1344 Press 2
>>>>>>>> _UK_     0-800-635-0551 Press 2
>>>>>>>>
>>>>>>>> My Extension     101
>>>>>>>> 24/7 Support     support@dosarrest.com <mailto:
>>>>>>>> support@dosarrest.com>
>>>>>>>> Email     colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>>>>> Website     http://www.dosarrest.com
>>>>>>>>
>>>>>>>> On 10/18/2013 5:20 AM, Aaron McCurry wrote:
>>>>>>>>
>>>>>>>>    I have an issue in Jira to document facets in 0.2.1, it's
not
>>>>>>>> been
>>>>>>>>
>>>>>>>>  worked
>>>>>>>>> yet but I hope I can get to it soon.  It looks like you
figured out
>>>>>>>>> what
>>>>>>>>> is
>>>>>>>>> there.
>>>>>>>>>
>>>>>>>>> We will likely improve facets in 0.3.0 so the API will
have to
>>>>>>>>> change a
>>>>>>>>> bit.  The biggest change we will need to make is the
scenario that
>>>>>>>>> you
>>>>>>>>> bring up.  Facets in the current implementation case
are simply
>>>>>>>>> other
>>>>>>>>> queries that can range from a single term to a complex
query. I'm
>>>>>>>>> assuming
>>>>>>>>> that you would like to specify a field name and get something
like
>>>>>>>>> a
>>>>>>>>> map
>>>>>>>>> of
>>>>>>>>> terms to counts for the given facet?
>>>>>>>>>
>>>>>>>>> The field facetCounts are counts that each of the facets
in the
>>>>>>>>> input
>>>>>>>>> list
>>>>>>>>> from the query.  So the count list corresponds one for
one to the
>>>>>>>>> facet
>>>>>>>>> list in the Query.  I realize this is less than ideal
and we can
>>>>>>>>> going
>>>>>>>>> to
>>>>>>>>> be improving it soon.
>>>>>>>>>
>>>>>>>>> If you have some suggestions on how you would want the
facet api to
>>>>>>>>> operate, new features, or anything else for that matter
just write
>>>>>>>>> up
>>>>>>>>> your
>>>>>>>>> thoughts on this thread and we can incorporate them into
the task.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Aaron
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 18, 2013 at 6:43 AM, Colton McInroy <
>>>>>>>>> colton@dosarrest.com
>>>>>>>>>
>>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>>>      Ok, so I created this method...
>>>>>>>>>>
>>>>>>>>>   public static BlurResults queryBlur(String queryString,
String
>>>>>>>>>
>>>>>>>>>> table)
>>>>>>>>>> {
>>>>>>>>>>             Iface client = BlurClient.getClient(****
>>>>>>>>>> mainConfig.getString("**
>>>>>>>>>> controllers"));
>>>>>>>>>>             Query query = new Query();
>>>>>>>>>>             query.setQuery(queryString);
>>>>>>>>>>
>>>>>>>>>>             Selector selector = new Selector();
>>>>>>>>>>
>>>>>>>>>>             // This will fetch all the columns in
family "fam0".
>>>>>>>>>>             selector.********addToColumnFamiliesToFetch("****
>>>>>>>>>> ****event");
>>>>>>>>>>             selector.********addToColumnFamiliesToFetch("****
>>>>>>>>>>
>>>>>>>>>> ****msg");
>>>>>>>>>>
>>>>>>>>>>             BlurQuery blurQuery = new BlurQuery();
>>>>>>>>>>             List<Facet> facets = Arrays.asList(new
>>>>>>>>>> Facet(queryString,
>>>>>>>>>> Long.MAX_VALUE));
>>>>>>>>>>             blurQuery.setFacets(facets);
>>>>>>>>>>             blurQuery.setFetch(50);
>>>>>>>>>>             blurQuery.setQuery(query);
>>>>>>>>>>             blurQuery.setSelector(********selector);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>             try {
>>>>>>>>>>                 BlurResults results = client.query(table,
>>>>>>>>>> blurQuery);
>>>>>>>>>>                 return results;
>>>>>>>>>>             } catch (BlurException e) {
>>>>>>>>>>                 // TODO Auto-generated catch block
>>>>>>>>>>                 e.printStackTrace();
>>>>>>>>>>             } catch (TException e) {
>>>>>>>>>>                 // TODO Auto-generated catch block
>>>>>>>>>>                 e.printStackTrace();
>>>>>>>>>>             }
>>>>>>>>>>             return null;
>>>>>>>>>>         }
>>>>>>>>>>
>>>>>>>>>>     From reading through source code, I was able
to find out that
>>>>>>>>>> you
>>>>>>>>>> specify
>>>>>>>>>> facets as a list, but this is fairly confusing to
me coming from
>>>>>>>>>> lucene.
>>>>>>>>>>
>>>>>>>>>> In lucene when getting facet data, I specify the
facet fields I am
>>>>>>>>>> interested in, and the facet results show me a top
X list of
>>>>>>>>>> values
>>>>>>>>>> within
>>>>>>>>>> that field. Whereas with blur, it appears that a
facet is another
>>>>>>>>>> query
>>>>>>>>>> which gives only a number as a result. When I tried
to obtain the
>>>>>>>>>> facet
>>>>>>>>>> data I am used to with Lucene, the only thing I could
find was...
>>>>>>>>>>
>>>>>>>>>> System.out.println("Facet Results: "+results.getFacetCountsSize()
>>>>>>>>>> **
>>>>>>>>>> **
>>>>>>>>>> ****);
>>>>>>>>>> System.out.println(JSONArray.********toJSONString(results.**
>>>>>>>>>> ******
>>>>>>>>>>
>>>>>>>>>> getFacetCounts()));
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Could you please elaborate on this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Colton McInroy
>>>>>>>>>>
>>>>>>>>>>      * Director of Security Engineering
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Phone
>>>>>>>>>> (Toll Free)
>>>>>>>>>> _US_    (888)-818-1344 Press 2
>>>>>>>>>> _UK_    0-800-635-0551 Press 2
>>>>>>>>>>
>>>>>>>>>> My Extension    101
>>>>>>>>>> 24/7 Support    support@dosarrest.com <mailto:
>>>>>>>>>> support@dosarrest.com
>>>>>>>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>>>>>>> Website         http://www.dosarrest.com
>>>>>>>>>>
>>>>>>>>>> On 10/18/2013 3:07 AM, Colton McInroy wrote:
>>>>>>>>>>
>>>>>>>>>>     I think I wrote this to soon, I believe I just
found out how
>>>>>>>>>> to
>>>>>>>>>> do
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>>   I'll test it out and supply some example code if
correct to help
>>>>>>>>>>
>>>>>>>>>>> others.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Colton McInroy
>>>>>>>>>>>
>>>>>>>>>>>      * Director of Security Engineering
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Phone
>>>>>>>>>>> (Toll Free)
>>>>>>>>>>> _US_     (888)-818-1344 Press 2
>>>>>>>>>>> _UK_     0-800-635-0551 Press 2
>>>>>>>>>>>
>>>>>>>>>>> My Extension     101
>>>>>>>>>>> 24/7 Support     support@dosarrest.com <mailto:
>>>>>>>>>>> support@dosarrest.com
>>>>>>>>>>> Email     colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>>>>>>>> Website     http://www.dosarrest.com
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/2013 2:58 AM, Colton McInroy wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hey Aaron,
>>>>>>>>>>>
>>>>>>>>>>>          You mentioned a while ago that blur
handles facets as
>>>>>>>>>>> well
>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>>>>>>>>> that
>>>>>>>>>>>> you would provide an example. Unless I have
missed that email, I
>>>>>>>>>>>> haven't
>>>>>>>>>>>> seen an example yet, could you provide one?
I just took a quick
>>>>>>>>>>>> look
>>>>>>>>>>>> myself
>>>>>>>>>>>> and could not figure it out. I see there
is an example
>>>>>>>>>>>> FacetQueryTest.java
>>>>>>>>>>>> in blur-query but that appears to be basically
just a copy of
>>>>>>>>>>>> the
>>>>>>>>>>>> lucene
>>>>>>>>>>>> file.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message