phoenix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamestay...@apache.org
Subject svn commit: r1734609 - in /phoenix/site: publish/faq.html source/src/site/markdown/faq.md
Date Fri, 11 Mar 2016 18:17:52 GMT
Author: jamestaylor
Date: Fri Mar 11 18:17:52 2016
New Revision: 1734609

URL: http://svn.apache.org/viewvc?rev=1734609&view=rev
Log:
Add new Why empty KeyValue FAQ

Modified:
    phoenix/site/publish/faq.html
    phoenix/site/source/src/site/markdown/faq.md

Modified: phoenix/site/publish/faq.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/faq.html?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/publish/faq.html (original)
+++ phoenix/site/publish/faq.html Fri Mar 11 18:17:52 2016
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2016-03-10
+ Generated by Apache Maven Doxia at 2016-03-11
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -158,6 +158,7 @@
  <li><a href="#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API">Can
phoenix work on tables with arbitrary timestamp as flexible as HBase API?</a></li>

  <li><a href="#Why_isnt_my_query_doing_a_RANGE_SCAN">Why isn’t my query
doing a RANGE SCAN?</a></li> 
  <li><a href="#Should_I_pool_Phoenix_JDBC_Connections">Should I pool Phoenix
JDBC Connections?</a></li> 
+ <li><a href="#Why_empty_key_value">Why does Phoenix add an empty or dummy KeyValue
when doing an upsert?</a></li> 
 </ul> 
 <div class="section"> 
  <div class="section"> 
@@ -363,6 +364,13 @@ conn.commit();
   <p>Phoenix’s Connection objects are different from most other JDBC Connections
due to the underlying HBase connection. The Phoenix Connection object is designed to be a
thin object that is inexpensive to create. If Phoenix Connections are reused, it is possible
that the underlying HBase connection is not always left in a healthy state by the previous
user. It is better to create new Phoenix Connections to ensure that you avoid any potential
issues.</p> 
   <p>Implementing pooling for Phoenix could be done simply by creating a delegate Connection
that instantiates a new Phoenix connection when retrieved from the pool and then closes the
connection when returning it to the pool (see <a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-2388">PHOENIX-2388</a>).</p>

  </div> 
+ <div class="section"> 
+  <h3 id="Why_empty_key_value">Why does Phoenix add an empty/dummy KeyValue when doing
an upsert?<a name="Why_does_Phoenix_add_an_emptydummy_KeyValue_when_doing_an_upsert"></a></h3>

+  <p>The empty or dummy KeyValue (with a column qualifier of _0) is needed to ensure
that a given column is available for all rows.</p> 
+  <p>As you may know, data is stored in HBase as KeyValues, meaning that the full row
key is stored for each column value. This also implies that the row key is not stored at all
unless there is at least one column stored.</p> 
+  <p>Now consider JDBC row which has an integer primary key, and several columns which
are all null. In order to be able to store the primary key, a KeyValue needs to be stored
to show that the row is present at all. This column is represented by the empty column that
you’ve noticed. This allows doing a “SELECT * FROM TABLE” and receiving records
for all rows, even those whose non-pk columns are null.</p> 
+  <p>The same issue comes up even if only one column is null for some (or all) records.
A scan over Phoenix will include the empty column to ensure that rows that only consist of
the primary key (and have null for all non-key columns) will be included in a scan result.</p>

+ </div> 
 </div>
 			</div>
 		</div>

Modified: phoenix/site/source/src/site/markdown/faq.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/faq.md?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/faq.md (original)
+++ phoenix/site/source/src/site/markdown/faq.md Fri Mar 11 18:17:52 2016
@@ -12,7 +12,7 @@
 * [Can phoenix work on tables with arbitrary timestamp as flexible as HBase API?](#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API)
 * [Why isn't my query doing a RANGE SCAN?](#Why_isnt_my_query_doing_a_RANGE_SCAN)
 * [Should I pool Phoenix JDBC Connections?](#Should_I_pool_Phoenix_JDBC_Connections)
-
+* [Why does Phoenix add an empty or dummy KeyValue when doing an upsert?](#Why_empty_key_value)
 
 ### I want to get started. Is there a Phoenix _Hello World_?
 
@@ -285,3 +285,26 @@ No, it is not necessary to pool Phoenix
 Phoenix's Connection objects are different from most other JDBC Connections due to the underlying
HBase connection. The Phoenix Connection object is designed to be a thin object that is inexpensive
to create. If Phoenix Connections are reused, it is possible that the underlying HBase connection
is not always left in a healthy state by the previous user. It is better to create new Phoenix
Connections to ensure that you avoid any potential issues.
 
 Implementing pooling for Phoenix could be done simply by creating a delegate Connection that
instantiates a new Phoenix connection when retrieved from the pool and then closes the connection
when returning it to the pool (see [PHOENIX-2388](https://issues.apache.org/jira/browse/PHOENIX-2388)).
+
+
+### <a id="Why_empty_key_value"/>Why does Phoenix add an empty/dummy KeyValue when
doing an upsert?
+The empty or dummy KeyValue (with a column qualifier of _0) is needed to ensure that a given
column is available
+for all rows.
+
+As you may know, data is stored in HBase as KeyValues, meaning that
+the full row key is stored for each column value. This also implies
+that the row key is not stored at all unless there is at least one
+column stored.
+
+Now consider JDBC row which has an integer primary key, and several
+columns which are all null. In order to be able to store the primary
+key, a KeyValue needs to be stored to show that the row is present at
+all. This column is represented by the empty column that you've
+noticed. This allows doing a "SELECT * FROM TABLE" and receiving
+records for all rows, even those whose non-pk columns are null.
+
+The same issue comes up even if only one column is null for some (or
+all) records. A scan over Phoenix will include the empty column to
+ensure that rows that only consist of the primary key (and have null
+for all non-key columns) will be included in a scan result.
+



Mime
View raw message