rya-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Eric" <Eric....@capitalone.com>
Subject Re: Timestamps and Cardinality in Queries
Date Wed, 01 Mar 2017 08:00:40 GMT
Hey Aaron,

I’m currently setting up Rya to test these queries with some of our data. I run into an
error when I run ‘mvn clean install’, I attached the logs but it seems like I can’t
connect to the snapshots repo you’re using.

As for “deep/wide”, it would be something like starting at a dataset, then fanning out
looking for relations where it is either the subject or object, such as the user who created
it, the job it came from, where it’s stored, etc. It would recurse on these neighboring
nodes until a total number of results is reached. However, if the cardinality of a node is
too high (for example, a user that owns a large number of datasets), the neighbors of that
node will not be found. Really, the goal is to find the most distance relevant relationships
possible, and this is our current naïve way of doing so.

Do you want to have a short call about this? I think it’d be easier to explain/answer questions
over the phone. I’m free pretty much any time 1pm-5pm PST tomorrow (3/1).

Thanks,
Eric

On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aaron.mihalik@gmail.com> wrote:

    deep vs wide: I played around with the property paths sparql operator and
    put up an example here [1].  This is a slightly different query than the
    one I sent out before.  It would be worth it for us to look at how this is
    actually executed by OpenRDF.
    
    Eric: Could you clarify by "deep vs wide"?  I think I understand your
    queries, but I don't have a good intuition about those terms and how
    cardinality might figure into a query.  It would probably be a bit more
    helpful if you provided a model or general description that is (somewhat)
    representative of your data.
    
    --Aaron
    
    [1]
    https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    
    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <adina@usna.edu> wrote:
    
    > Hi Eric,
    >
    > If you want to query by the Accumulo timestamp, something like
    > timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not try
    > it lately, but timeRange() was in Rya originally. Not sure if it was
    > removed in later iterations or whether it would be useful for your use
    > case. First Rya paper
    > https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf discusses
    > time ranges (Section 5.3 at the link above)
    >
    > Adina
    >
    > On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pujav65@gmail.com> wrote:
    >
    > > Hey John,
    > > I'm pretty sure your pull request was merged-- it was pulled in through
    > > another pull request.  If not, sorry-- I thought it had been merged and
    > > then just not closed.  I was going to spend some time doing merges
    > tomorrow
    > > so I can get it tomorrow.
    > >
    > > Sent from my iPhone
    > >
    > > > On Feb 23, 2017, at 8:13 PM, John Smith <johns0806@gmail.com> wrote:
    > > >
    > > > I have a pull request that fixes that problem.. it has been stuck in
    > > limbo
    > > > for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
    > > > someone merge it into master?
    > > >
    > > >> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Eric.Liu@capitalone.com>
    > > wrote:
    > > >>
    > > >> Cool, thanks for the help.
    > > >> By the way, the link to the Rya Manual is outdated on the
    > > rya.apache.org
    > > >> site. Should be pointing at https://github.com/apache/
    > > >> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    > index.md
    > > >>
    > > >> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aaron.mihalik@gmail.com>
    > > wrote:
    > > >>
    > > >>    deep vs wide:
    > > >>
    > > >>    A property path query is probably your best bet.  Something like:
    > > >>
    > > >>    for the following data:
    > > >>
    > > >>    s:EventA p:causes s:EventB
    > > >>    s:EventB p:causes s:EventC
    > > >>    s:EventC p:causes s:EventD
    > > >>
    > > >>
    > > >>    This query would start at EventB and work it's way up and down the
    > > >> chain:
    > > >>
    > > >>    SELECT * WHERE {
    > > >>       <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s
?p ?o
    > > >>    }
    > > >>
    > > >>
    > > >>    On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    > > Caleb.Meier@parsons.com>
    > > >>    wrote:
    > > >>
    > > >>> Yes, that's a good place to start.  If you have external timestamps
    > > >> that
    > > >>> are built into your graph using the time ontology in owl (e.g you
    > > >> have
    > > >>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
    > > >> the
    > > >>> temporal index is exactly what you want.  If you are hoping to
query
    > > >> based
    > > >>> on the internal timestamps that Accumulo assigns to your triples,
    > > >> then
    > > >>> there are some slight tweaks that can be done to facilitate this,
    > > >> but it
    > > >>> won't be nearly as efficient (this will require some sort of client
    > > >> side
    > > >>> filtering).
    > > >>>
    > > >>> Caleb A. Meier, Ph.D.
    > > >>> Software Engineer II ♦ Analyst
    > > >>> Parsons Corporation
    > > >>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    > > >>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
    > > >>> Caleb.Meier@Parsons.com ♦ www.parsons.com
    > > >>>
    > > >>> -----Original Message-----
    > > >>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    > > >>> Sent: Thursday, February 23, 2017 2:27 PM
    > > >>> To: dev@rya.incubator.apache.org
    > > >>> Subject: Re: Timestamps and Cardinality in Queries
    > > >>>
    > > >>> We’d like to be able to query by timestamp; specifically, we
want to
    > > >> be
    > > >>> able to find all statements that were made within a given time
    > > >> range. Is
    > > >>> this what I should be looking at?
    > > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    > > >> apache.org_confluence_download_attachments_63407907_
    > > >> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    > > >> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    > > >> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    > > geo_4WXTD0qo8&m=
    > > >> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    > > >> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    > > >>>
    > > >>>
    > > >>>
    > > >>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Caleb.Meier@parsons.com>
wrote:
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Hey Eric,
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Currently timestamps can't be queried in Rya.  Do you need to
be
    > > >> able
    > > >>> to query by timestamp, or simply discover the timestamp for a given
    > > >> node?
    > > >>> Rya does have a temporal index, but that requires you to use a
    > > >> temporal
    > > >>> ontology to model the temporal properties of your graph nodes.
    > > >>>
    > > >>>    ________________________________________
    > > >>>
    > > >>>    From: Liu, Eric <Eric.Liu@capitalone.com>
    > > >>>
    > > >>>    Sent: Wednesday, February 22, 2017 6:38 PM
    > > >>>
    > > >>>    To: dev@rya.incubator.apache.org
    > > >>>
    > > >>>    Subject: Timestamps and Cardinality in Queries
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Hi,
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Continuing from our talk earlier today I was wondering if you
    > > >> could
    > > >>> provide more information about how timestamps could be queried
in
    > > >> Rya.
    > > >>>
    > > >>>    Also, we are trying to support a type of query that would
    > > >> essentially
    > > >>> be limiting on cardinality (different from the normal SPARQL limit
    > > >> because
    > > >>> it’s for node cardinality rather than total results). I saw in
one of
    > > >>> Caleb’s talks that Rya’s query optimization involves checking
    > > >> cardinality
    > > >>> first. I was wondering if there would be some way to tap into this
    > > >> feature
    > > >>> for usage in queries?
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Thanks,
    > > >>>
    > > >>>    Eric Liu
    > > >>>
    > > >>>    ________________________________________________________
    > > >>>
    > > >>>
    > > >>>
    > > >>>    The information contained in this e-mail is confidential and/or
    > > >>> proprietary to Capital One and/or its affiliates and may only be
used
    > > >>> solely in performance of work or services for Capital One. The
    > > >> information
    > > >>> transmitted herewith is intended only for use by the individual
or
    > > >> entity
    > > >>> to which it is addressed. If the reader of this message is not
the
    > > >> intended
    > > >>> recipient, you are hereby notified that any review, retransmission,
    > > >>> dissemination, distribution, copying or other use of, or taking
of
    > > >> any
    > > >>> action in reliance upon this information is strictly prohibited.
If
    > > >> you
    > > >>> have received this communication in error, please contact the sender
    > > >> and
    > > >>> delete the material from your computer.
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>> ________________________________________________________
    > > >>>
    > > >>>
    > > >>>
    > > >>> The information contained in this e-mail is confidential and/or
    > > >>> proprietary to Capital One and/or its affiliates and may only be
used
    > > >>> solely in performance of work or services for Capital One. The
    > > >> information
    > > >>> transmitted herewith is intended only for use by the individual
or
    > > >> entity
    > > >>> to which it is addressed. If the reader of this message is not
the
    > > >> intended
    > > >>> recipient, you are hereby notified that any review, retransmission,
    > > >>> dissemination, distribution, copying or other use of, or taking
of
    > > >> any
    > > >>> action in reliance upon this information is strictly prohibited.
If
    > > >> you
    > > >>> have received this communication in error, please contact the sender
    > > >> and
    > > >>> delete the material from your computer.
    > > >>>
    > > >>
    > > >>
    > > >> ________________________________________________________
    > > >>
    > > >> The information contained in this e-mail is confidential and/or
    > > >> proprietary to Capital One and/or its affiliates and may only be used
    > > >> solely in performance of work or services for Capital One. The
    > > information
    > > >> transmitted herewith is intended only for use by the individual or
    > > entity
    > > >> to which it is addressed. If the reader of this message is not the
    > > intended
    > > >> recipient, you are hereby notified that any review, retransmission,
    > > >> dissemination, distribution, copying or other use of, or taking of
any
    > > >> action in reliance upon this information is strictly prohibited. If
    > you
    > > >> have received this communication in error, please contact the sender
    > and
    > > >> delete the material from your computer.
    > > >>
    > >
    >
    >
    >
    > --
    > Dr. Adina Crainiceanu
    > Associate Professor, Computer Science Department
    > United States Naval Academy
    > 410-293-6822 <(410)%20293-6822>
    > adina@usna.edu
    > http://www.usna.edu/Users/cs/adina/
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One
and/or its affiliates and may only be used solely in performance of work or services for Capital
One. The information transmitted herewith is intended only for use by the individual or entity
to which it is addressed. If the reader of this message is not the intended recipient, you
are hereby notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is strictly prohibited.
If you have received this communication in error, please contact the sender and delete the
material from your computer.
Mime
View raw message