accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Embedded Mutations: Is this kind of thing done?
Date Fri, 25 Apr 2014 15:02:54 GMT
I might be causing more confusion. Consider the following:

{"name":"Josh", "age":85}

If you stored the attribute name in the colf and the type (string or 
int) in the colq, it works fine for the above document.

Now consider the following document, say where there were multiple 
sources of my age with we didn't know which was reliable

{"name":"Josh", "age":[40,85]}

In the aforementioned scheme, "rowid age:int -> 40" and "rowid age:int 
-> 85" would collapse on one another. These are the Map (as in your 
java.util.Map) semantics that Accumulo provides.

If you have very distinct data types (which it appears you do), this 
might not be of concern to you. Just be cognizant in your translation 
from EMF to Key that you aren't creating duplicate Keys unexpectedly.

On 4/25/14, 10:53 AM, Geoffry Roberts wrote:
> Ok Josh, you have me worried.
>
> I am storing the object's name in the colfam: e.g. "patientId", the
> object's data type goes in the colq: e.g "org.hl7.v3.II", then the value
> in the colval.  I think the largest graph I'm likely to have is < 5k and
> you say I soul have memory problems.  This is good topic.  How then can
> I estimate?
>
>
> On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Not necessarily. If you are storing just the type in the colq and
>     have one value and type per document/row, you won't have a problem.
>     If you have more than one value in a type per document/row, the last
>     one you inserted will be what sticks (which is likely undesirable).
>
>     Of course, this is also assuming there isn't some other uniquely
>     identifying attribute in the colfam.
>
>
>     On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
>
>         Thanks for the comments.
>
>         I'm using the qualifier to tell me the type of the value.
>           Sounds like
>         I'm misusing it.
>
>         My EMF documents are running  no more than 5k so I gather a row
>         will fit
>         into memory well enough.
>
>
>         On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <madrob@cloudera.com
>         <mailto:madrob@cloudera.com>
>         <mailto:madrob@cloudera.com <mailto:madrob@cloudera.com>>> wrote:
>
>              Large rows are only an issue if you are going to try to put the
>              entire row in memory at once. As long as you have small enough
>              entries in the row, and can treat them individually, you
>         should be fine.
>
>              The qualifier is anything that you want to use to determine
>              uniqueness across keys. So yes, this sounds fine, although
>         possibly
>              not fine grain enough.
>
>              Mike
>
>
>              On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
>              <threadedblue@gmail.com <mailto:threadedblue@gmail.com>
>         <mailto:threadedblue@gmail.com
>         <mailto:threadedblue@gmail.com>__>> wrote:
>
>                  Interesting, multiple mutations that is.  Are we talking
>                  multiples on the same row id?
>
>                  Upon reflection, I realized the embedded thing is nothing
>                  special.  I think I'll keep adding columns to a single
>         mutation.
>                    This will make for a wide row, but I'm not seeing
>         that as a
>                  problem.  I am I being naive?
>
>                  Another question if I may.  As I walk my graph, I must keep
>                  track of the type of the value being persisted.  I am
>         using the
>                  qualifier for this, putting in it a URI that indicates
>         the type.
>                    Is this a proper use for the qualifier?
>
>                  Thanks for the discussion
>
>
>                  On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
>                  <wilhelm.von.cloud@accumulo.__net
>         <mailto:wilhelm.von.cloud@accumulo.net>
>                  <mailto:wilhelm.von.cloud@__accumulo.net
>         <mailto:wilhelm.von.cloud@accumulo.net>>> wrote:
>
>                      Depending on your table schema, you'll probably want to
>                      translate an object graph into multiple mutations.
>
>
>                      On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
>                      <david.medinets@gmail.com
>         <mailto:david.medinets@gmail.com>
>         <mailto:david.medinets@gmail.__com
>         <mailto:david.medinets@gmail.com>>>
>
>                      wrote:
>
>                          If the sub-document changes, you'll need to
>         search the
>                          values of every Accumulo entry?
>
>
>                          On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
>                          <threadedblue@gmail.com
>         <mailto:threadedblue@gmail.com> <mailto:threadedblue@gmail.com
>         <mailto:threadedblue@gmail.com>__>>
>
>                          wrote:
>
>                              The use case is, I am walking a complex
>         object graph
>                              and persisting what I find there.  Said
>         object graph
>                              in my case is always EMF (eclipse modeling
>                              framework) compliant.  An EMF graph can
>         have in if
>                              references to--brace yourself--a non-cross
>         document
>                              containment reference.  When using Mongo,
>         these were
>                              persisted as a DBObject embedded into a
>         containing
>                              DBObject.  I'm trying to decide whether I
>         want to
>                              follow suit.
>
>                              Any thoughts?
>
>
>                              On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
>                              <busbey@cloudera.com
>         <mailto:busbey@cloudera.com> <mailto:busbey@cloudera.com
>         <mailto:busbey@cloudera.com>>>
>
>                              wrote:
>
>                                  Can you describe the use case more? Do
>         you know
>                                  what the purpose for the embedded
>         changes are?
>
>
>                                  On Thu, Apr 24, 2014 at 2:59 PM,
>         Geoffry Roberts
>                                  <threadedblue@gmail.com
>         <mailto:threadedblue@gmail.com>
>                                  <mailto:threadedblue@gmail.com
>         <mailto:threadedblue@gmail.com>__>> wrote:
>
>                                      All,
>
>                                      I am in the throws of converting
>                                      some(else's) code from MongoDB to
>         Accumulo.
>                                        I am seeing a situation where one
>         DBObject
>                                      if being embedded into another
>         DBObject.  I
>                                      see that Mutation supports a method
>         called
>                                      getRow()  that returns a byte array.  I
>                                      gather I can use this to achieve a
>         similar
>                                      result if I were so inclined.
>
>                                      Am I so inclined?  i.e. Is this the
>         way we
>                                      do things in Accumulo?
>
>                                      DBObject, roughly speaking, is Mongo's
>                                      counterpart to Mutation.
>
>                                      Thanks mucho
>
>                                      --
>                                      There are ways and there are ways,
>
>                                      Geoffry Roberts
>
>
>
>
>                                  --
>                                  Sean
>
>
>
>
>                              --
>                              There are ways and there are ways,
>
>                              Geoffry Roberts
>
>
>
>
>
>
>                  --
>                  There are ways and there are ways,
>
>                  Geoffry Roberts
>
>
>
>
>
>         --
>         There are ways and there are ways,
>
>         Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Mime
View raw message