hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Valle (BLOOMBERG/ LONDON)" <mvallemil...@bloomberg.net>
Subject pre-spliting or not, that's the question
Date Tue, 07 Apr 2015 10:56:35 GMT
Hello, 

I am still in my first steps with HBase, I was used to use Cassandra a while ago.

For several years, I was used to think trying to store data in Cassandra ordered among nodes
was something evil, as it's OrderedPartitioner is something not supported and not recommended
in production. 

In HBase/Hadoop would, this is the default though. When trying to optimize for writes, I was
told people use to use pre-spiting in HBase, some times using salting keys. This seems to
make HBase behave as Cassandra random partitioner, loosing data order across nodes (because
of the salting) but having a better write throughput.  

Because of these differences, I started to question what's the real advantage of having ordered
data across nodes. For most applications, wouldn't pre-splitting be better? For a large number
of applications, designing data without relying on order across nodes seems better, as 1 -
it might be possible and 2 - when it's not possible you can whether use another table as index
or index data to Solr/ES/Lucene and read from there in more complex scenarios. Maybe in some
specific cases where you want little latency from the time you write data to time you read
data, but reading much more than you write it could have some advantage, maybe...

As acting as a sorted map was a concept design decision of HBase, I think there must be reasons
behind this decision and it seems I am not being able to figure these... Could you please
point them out? 

I am asking this to improve my architectural understanding of HBase, as sometimes I might
be getting the wrong impression there is no advantage in using post-splitting solution, when
maybe it's just lack of knowledge I have on the technology.

Best regards,
Marcelo.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message