hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2493) hbase will split on row when the start and end row is the same cuase data loss
Date Tue, 15 Jan 2008 20:49:34 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bryan Duxbury updated HADOOP-2493:
----------------------------------

    Status: Patch Available  (was: Open)

Back to Hudson.

> hbase will split on row when the start and end row is the same cuase data loss
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-2493
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2493
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Assignee: Bryan Duxbury
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: 2493-v2.patch, 2493.patch, regions_shot.JPG
>
>
> While testing hbase splits with my code I was loading a table to become a inverted index
on some links
> I was using the anchor text as the row key 
> and the column parent:child as
> url:(siteurl) and the data is the count of the links pointing to the siteurl with row
key anchor text.
> but a lot of sites have image links and I use "image" as the anchor text for my testing
code so there is a lot of image links. 
> I changed the max file size of hbase to 16mb for testing and have been able to recreate
the same error.
> When the table get big it splits on the column image as the end key for one table and
the start of the next table later it splits to where the start key and end key was image for
one of the splits. After that it keep spiting the region with start key as "image" and the
end key the same. So I have multi splits with start key and end key as "image" unless the
master keeps track of the row key and partend:child data on the splits I do not thank all
the data will get returned when querying it.
> I have attached a screen shot of my regions i thank there should be some logic to where
if the start and end row key is the same the region does not split or we need to start keeping
track of the start key, column data on the master of each split so we can know where each
row is in the database.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message