accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <threadedb...@gmail.com>
Subject Re: Embedded Mutations: Is this kind of thing done?
Date Fri, 25 Apr 2014 15:28:41 GMT
I think you told me something.  I must watch the rowid colfam colq sequence
and be sure they are unique within the row.  Will do. I believe I do have
distinct datatypes for now (they're medical) but the future may rear it's
ugly head.


On Fri, Apr 25, 2014 at 11:02 AM, Josh Elser <josh.elser@gmail.com> wrote:

> I might be causing more confusion. Consider the following:
>
> {"name":"Josh", "age":85}
>
> If you stored the attribute name in the colf and the type (string or int)
> in the colq, it works fine for the above document.
>
> Now consider the following document, say where there were multiple sources
> of my age with we didn't know which was reliable
>
> {"name":"Josh", "age":[40,85]}
>
> In the aforementioned scheme, "rowid age:int -> 40" and "rowid age:int ->
> 85" would collapse on one another. These are the Map (as in your
> java.util.Map) semantics that Accumulo provides.
>
> If you have very distinct data types (which it appears you do), this might
> not be of concern to you. Just be cognizant in your translation from EMF to
> Key that you aren't creating duplicate Keys unexpectedly.
>
>
> On 4/25/14, 10:53 AM, Geoffry Roberts wrote:
>
>> Ok Josh, you have me worried.
>>
>> I am storing the object's name in the colfam: e.g. "patientId", the
>> object's data type goes in the colq: e.g "org.hl7.v3.II", then the value
>> in the colval.  I think the largest graph I'm likely to have is < 5k and
>> you say I soul have memory problems.  This is good topic.  How then can
>> I estimate?
>>
>>
>> On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <josh.elser@gmail.com
>> <mailto:josh.elser@gmail.com>> wrote:
>>
>>     Not necessarily. If you are storing just the type in the colq and
>>     have one value and type per document/row, you won't have a problem.
>>     If you have more than one value in a type per document/row, the last
>>     one you inserted will be what sticks (which is likely undesirable).
>>
>>     Of course, this is also assuming there isn't some other uniquely
>>     identifying attribute in the colfam.
>>
>>
>>     On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
>>
>>         Thanks for the comments.
>>
>>         I'm using the qualifier to tell me the type of the value.
>>           Sounds like
>>         I'm misusing it.
>>
>>         My EMF documents are running  no more than 5k so I gather a row
>>         will fit
>>         into memory well enough.
>>
>>
>>         On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <madrob@cloudera.com
>>         <mailto:madrob@cloudera.com>
>>         <mailto:madrob@cloudera.com <mailto:madrob@cloudera.com>>>
wrote:
>>
>>              Large rows are only an issue if you are going to try to put
>> the
>>              entire row in memory at once. As long as you have small
>> enough
>>              entries in the row, and can treat them individually, you
>>         should be fine.
>>
>>              The qualifier is anything that you want to use to determine
>>              uniqueness across keys. So yes, this sounds fine, although
>>         possibly
>>              not fine grain enough.
>>
>>              Mike
>>
>>
>>              On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
>>              <threadedblue@gmail.com <mailto:threadedblue@gmail.com>
>>         <mailto:threadedblue@gmail.com
>>
>>         <mailto:threadedblue@gmail.com>__>> wrote:
>>
>>                  Interesting, multiple mutations that is.  Are we talking
>>                  multiples on the same row id?
>>
>>                  Upon reflection, I realized the embedded thing is nothing
>>                  special.  I think I'll keep adding columns to a single
>>         mutation.
>>                    This will make for a wide row, but I'm not seeing
>>         that as a
>>                  problem.  I am I being naive?
>>
>>                  Another question if I may.  As I walk my graph, I must
>> keep
>>                  track of the type of the value being persisted.  I am
>>         using the
>>                  qualifier for this, putting in it a URI that indicates
>>         the type.
>>                    Is this a proper use for the qualifier?
>>
>>                  Thanks for the discussion
>>
>>
>>                  On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
>>                  <wilhelm.von.cloud@accumulo.__net
>>         <mailto:wilhelm.von.cloud@accumulo.net>
>>                  <mailto:wilhelm.von.cloud@__accumulo.net
>>
>>         <mailto:wilhelm.von.cloud@accumulo.net>>> wrote:
>>
>>                      Depending on your table schema, you'll probably want
>> to
>>                      translate an object graph into multiple mutations.
>>
>>
>>                      On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
>>                      <david.medinets@gmail.com
>>         <mailto:david.medinets@gmail.com>
>>         <mailto:david.medinets@gmail.__com
>>
>>         <mailto:david.medinets@gmail.com>>>
>>
>>                      wrote:
>>
>>                          If the sub-document changes, you'll need to
>>         search the
>>                          values of every Accumulo entry?
>>
>>
>>                          On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
>>                          <threadedblue@gmail.com
>>         <mailto:threadedblue@gmail.com> <mailto:threadedblue@gmail.com
>>         <mailto:threadedblue@gmail.com>__>>
>>
>>
>>                          wrote:
>>
>>                              The use case is, I am walking a complex
>>         object graph
>>                              and persisting what I find there.  Said
>>         object graph
>>                              in my case is always EMF (eclipse modeling
>>                              framework) compliant.  An EMF graph can
>>         have in if
>>                              references to--brace yourself--a non-cross
>>         document
>>                              containment reference.  When using Mongo,
>>         these were
>>                              persisted as a DBObject embedded into a
>>         containing
>>                              DBObject.  I'm trying to decide whether I
>>         want to
>>                              follow suit.
>>
>>                              Any thoughts?
>>
>>
>>                              On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
>>                              <busbey@cloudera.com
>>         <mailto:busbey@cloudera.com> <mailto:busbey@cloudera.com
>>
>>         <mailto:busbey@cloudera.com>>>
>>
>>                              wrote:
>>
>>                                  Can you describe the use case more? Do
>>         you know
>>                                  what the purpose for the embedded
>>         changes are?
>>
>>
>>                                  On Thu, Apr 24, 2014 at 2:59 PM,
>>         Geoffry Roberts
>>                                  <threadedblue@gmail.com
>>         <mailto:threadedblue@gmail.com>
>>                                  <mailto:threadedblue@gmail.com
>>
>>         <mailto:threadedblue@gmail.com>__>> wrote:
>>
>>                                      All,
>>
>>                                      I am in the throws of converting
>>                                      some(else's) code from MongoDB to
>>         Accumulo.
>>                                        I am seeing a situation where one
>>         DBObject
>>                                      if being embedded into another
>>         DBObject.  I
>>                                      see that Mutation supports a method
>>         called
>>                                      getRow()  that returns a byte array.
>>  I
>>                                      gather I can use this to achieve a
>>         similar
>>                                      result if I were so inclined.
>>
>>                                      Am I so inclined?  i.e. Is this the
>>         way we
>>                                      do things in Accumulo?
>>
>>                                      DBObject, roughly speaking, is
>> Mongo's
>>                                      counterpart to Mutation.
>>
>>                                      Thanks mucho
>>
>>                                      --
>>                                      There are ways and there are ways,
>>
>>                                      Geoffry Roberts
>>
>>
>>
>>
>>                                  --
>>                                  Sean
>>
>>
>>
>>
>>                              --
>>                              There are ways and there are ways,
>>
>>                              Geoffry Roberts
>>
>>
>>
>>
>>
>>
>>                  --
>>                  There are ways and there are ways,
>>
>>                  Geoffry Roberts
>>
>>
>>
>>
>>
>>         --
>>         There are ways and there are ways,
>>
>>         Geoffry Roberts
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Mime
View raw message