hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2493) hbase will split on row when the start and end row is the same cuase data loss
Date Mon, 31 Dec 2007 20:27:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555134

Billy Pearson commented on HADOOP-2493:

yes one of the two would work but something needed to keep it from spiting or the master needs
to start keeping track of the column parent:child in the Meta so it can track the splits correctly.

> hbase will split on row when the start and end row is the same cuase data loss
> ------------------------------------------------------------------------------
>                 Key: HADOOP-2493
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2493
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Priority: Critical
>         Attachments: regions_shot.JPG
> While testing hbase splits with my code I was loading a table to become a inverted index
on some links
> I was using the anchor text as the row key 
> and the column parent:child as
> url:(siteurl) and the data is the count of the links pointing to the siteurl with row
key anchor text.
> but a lot of sites have image links and I use "image" as the anchor text for my testing
code so there is a lot of image links. 
> I changed the max file size of hbase to 16mb for testing and have been able to recreate
the same error.
> When the table get big it splits on the column image as the end key for one table and
the start of the next table later it splits to where the start key and end key was image for
one of the splits. After that it keep spiting the region with start key as "image" and the
end key the same. So I have multi splits with start key and end key as "image" unless the
master keeps track of the row key and partend:child data on the splits I do not thank all
the data will get returned when querying it.
> I have attached a screen shot of my regions i thank there should be some logic to where
if the start and end row key is the same the region does not split or we need to start keeping
track of the start key, column data on the master of each split so we can know where each
row is in the database.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message