accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4022) Create a concept of multi-homed tablets
Date Thu, 08 Oct 2015 19:29:26 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949221#comment-14949221
] 

Josh Elser commented on ACCUMULO-4022:
--------------------------------------

Writing down some more considerations: I think this would be a nice addition to Accumulo.
I don't think it would be too terrible to implement -- I think we have a good amount of necessary
infrastructure already laid (well, as tongue in cheek as I can say that). Sacrificing strong
consistency to achieve high availability and lower query latency is a reasonable tradeoff.
It would also be relatively straightforward to add in API hooks (akin to the write Durability
Eric added recently) to allow clients to request strong consistency or stale reads.

One point I do want to mention is that I believe HBase went this route because they got to
a point where they couldn't get a better 0.99 latency mark, dealing with log recovery and
the JVM's GC cycles. It would be prudent to actually measure this and see if there are more
low-hanging fruit to be picked before a massive new feature such as this would be attempted.
We're kind of ignored this aspect of the system for years now (low latency queries). I think
some due-diligence is needed.

That said, if someone else is going forward with it, I'm not going to stop ya. Once we get
to the point where it would be appropriate, I'm sure Enis and Devaraj Das from HBase (avoiding
JIRA mentioning them now since we're not at a point to bring them into the fray, IMO) would
be more than happy to critique a design doc and give feedback on what did and didn't work
well for HBase's implementation. I will say that their implementation did make the read pipeline
a bit more difficult to follow (essentially another level of indirection on which server a
client chooses to read from is introduced on every call). Lots of to-be-answered questions
I believe still exist surrounding intelligent decisions when multiple locations to read some
data exist.

> Create a concept of multi-homed tablets
> ---------------------------------------
>
>                 Key: ACCUMULO-4022
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4022
>             Project: Accumulo
>          Issue Type: Wish
>          Components: client, tserver
>            Reporter: marco polo
>              Labels: performance
>
> I'm an accumulo newbie, but I wish to see the concept of multi-homed tablets. This allows
us to have tablets hosted by multiple servers, with only one being writable against it. This
concept would allow n receiver servers for a tablet. An example might be a tablet that has
become a hot spot could be dynamically hosted elsewhere, and clients could pick this up as
a potential. Consistency must be kept between the hosts, as the initial read/write host may
compact or write to that tablet. 
> To me the larger problem may come from live ingest in which the write ahead log has not
been flushed. To avoid having to write to the read only servers in a pipeline, we would likely
need to create a model of enforcing reads only after a flush of that tablet or a thrift interface
to allow reading only the data in memory to ensure consistency is enforced. I haven't given
great thought to solving this yet. 
> Please comment with ideas and pitfalls as I would like to see this wish come to fruition
with actionable tickets after some community thought.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message