hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajeshkumar J <rajeshkumarit8...@gmail.com>
Subject Re: Flat-wide table Hbase
Date Thu, 17 Dec 2015 05:35:46 GMT
Hi,

   I have my input file sized 1gb in HDFS. I am inserting it into hbase
table and also generating column qualifiers name dynamically. And I am
doing this in java and it took about 20 hours to finish this process. Can
any one help me in optimize the below java code


public class HbaseinsertionScan {

    //As I am generating column names dynamically also I need column names
to be in certain order so I decidedto have column name as combination of
numerical value  and below alphabets
    public static String[] ColumnNames = {"a", "aa", "b", "bb", "c", "cc",
"d", "dd", "e", "ee", "f", "ff", "g", "gg", "h", "hh", "i", "j", "k", "l",
"m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"};


    public static void main(String[] args) throws IOException {

        // Initializing req classes
        BufferedReader br = null;
        Configuration conf = new Configuration();

        Configuration config = HBaseConfiguration.create();

   Scan scan = new Scan();

        // creating object for tables
//table 1 which holds actual data
        HTable table1 = new HTable(config, "table1");

//table 2 holds numerical value for each rowkey
        HTable table2 = new HTable(config, "table2");


        // Reading the files from HDFS and inserting it into Hbase Tabls

        FileSystem fs = FileSystem.get(conf);
        FileStatus[] status = fs.listStatus(new Path("/filename"));
        for (int i = 0; i < status.length; i++) {
            br = new BufferedReader(new
InputStreamReader(fs.open(status[i].getPath())));
            String line = "";

            while ((line = br.readLine()) != null) {

                String[] colvalues = line.split(",");

                //Generating column names dynamically
                String colno = "199999";

                RowFilter filter = new
RowFilter(CompareFilter.CompareOp.EQUAL, new
BinaryComparator(Bytes.toBytes(colvalues[1])));
                scan.setFilter(filter);
                ResultScanner scanner = table2.getScanner(scan);

                for (Result result : scanner) {

                    for (KeyValue kv : result.raw()) {

                        colno = Bytes.toString(kv.getValue());
                        long newval = Long.valueOf(colno)-1;
                        System.out.println(newval);
                        colno = String.valueOf(newval);
                    }
                }
                for (int index = 0; index < ColumnNames.length; index++) {

                    // Adding rowkey
                    Put p = new Put(Bytes.toBytes(colvalues[1]));

                    // Adding column family name, qualifier name ,value
                    p.add(Bytes.toBytes("test_family"),
                            Bytes.toBytes(colno + ColumnNames[index]),
Bytes.toBytes(colvalues[index]));

                    table1.put(p);

                    Put col = new Put(Bytes.toBytes(colvalues[1]));

                    // Adding column family name, qualifier name ,value
                    col.add(Bytes.toBytes("test_family"),
                            Bytes.toBytes("randomno"),
Bytes.toBytes(colno));

                    table2.put(col);

                    //System.out.println("row inserted");
                }

            }

        }
        hTable.close();
        coltable.close();
        br.close();

    }


}

On Mon, Dec 14, 2015 at 9:35 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Rajesh,
>
> For the column qualifier, there is no need to "create" them in advance.
> Just setup what ever you want when you build your Put and HBase will take
> it...
>
> JMS
>
> 2015-12-14 6:05 GMT-05:00 Rajeshkumar J <rajeshkumarit8292@gmail.com>:
>
> > Hi
> >
> >    Thanks. This is what I need and I am considering this as flat-wide
> table
> > approach.
> >
> >    I have some doubts and first of them is how to create dynamic column
> > qualifiers. Do you know the command or any other sites which is useful
> for
> > this approach.
> >
> > Thanks
> >
> > On Mon, Dec 14, 2015 at 4:28 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > That is correct. As long as ths column qualifier is different. But they
> > > will still go on the same region and after compactions will end up in
> the
> > > same file.
> > >
> > > JMS
> > >
> > > JMS
> > > Le 2015-12-14 6:55 AM, "Rajeshkumar J" <rajeshkumarit8292@gmail.com>
a
> > > écrit :
> > >
> > > > Hi,
> > > >
> > > >    So as per your reply inserting second row will not update the
> > existing
> > > > row-key and it will add as new column qualifiers to the existing
> > row-key
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Dec 14, 2015 at 4:13 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > When you will insert the 2nd row, HBase wil just add is after the
> > first
> > > > > one. On the storage side it will be another key/value entry AFTER
> the
> > > > first
> > > > > one. On the conceptual view,, it will be seen as anothe column for
> > the
> > > > same
> > > > > row (wide approach). HBase will not update the previous existing
> > > entry. I
> > > > > will create a new one for the new key/value. The 1002-xxx |
> url.com
> > > that
> > > > > you have insterted before will not be touched.
> > > > >
> > > > > you have to see all those key/values are totally independent. If
> they
> > > > have
> > > > > a different column name, what you do with one will have not any
> > impact
> > > on
> > > > > the others.
> > > > >
> > > > > JMS
> > > > >
> > > > > 2015-12-14 5:37 GMT-05:00 Rajeshkumar J <
> rajeshkumarit8292@gmail.com
> > >:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > >   Thanks for your response but in your previous answer you have
> > > > mentioned
> > > > > > as follows
> > > > > >
> > > > > >
> > > > > > --------------------------------------------------------------
> > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > --------------------------------------------------------------
> > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > --------------------------------------------------------------
> > > > > >
> > > > > > "This is if 1002-xxx is your key and "url.com" is your column
> > > > qualifier"
> > > > > >
> > > > > > I have input rows as follows
> > > > > >
> > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > >
> > > > > > when I insert  first row to my hbase table 1002-xxx will be
> > inserted
> > > as
> > > > > > rowkey and url.com will be one of my column qualifier
> > > > > >
> > > > > > what happens when I try to insert next row i.e., 1002 | xxx
|
> > > > urrl2.com
> > > > > |
> > > > > > zz:zz:zz for this also row-key will be 1002-xxx. As far as I
know
> > > when
> > > > we
> > > > > > try to insert same row-key the row will be updated.
> > > > > >
> > > > > > what to do for this cases?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, Dec 14, 2015 at 3:49 PM, Jean-Marc Spaggiari <
> > > > > > jean-marc@spaggiari.org> wrote:
> > > > > >
> > > > > > > Hi Rajesh,
> > > > > > >
> > > > > > > This is not a tall table. Tall will be something whereyou
put
> > your
> > > > > domain
> > > > > > > name on the key, no on the column qualifier. Putting the
domain
> > on
> > > > the
> > > > > > > columns means you will have many many columns for the same
key.
> > At
> > > > the
> > > > > > end,
> > > > > > > HBase always stores the key for each and every column,
what
> ever
> > it
> > > > is
> > > > > > tall
> > > > > > > or wide.
> > > > > > >
> > > > > > > Reading 1000 rows or reading 1000 columns for HBase is
exactly
> > the
> > > > same
> > > > > > > thing. The only difference is that between 1000 rows HBase
> might
> > > > split
> > > > > > the
> > > > > > > rows into 2 regions. If you have 1000 columns, HBase will
not
> > split
> > > > > them.
> > > > > > >
> > > > > > > HBase can return a row in few milli seconds. 2 seconds
for one
> > Cell
> > > > is
> > > > > a
> > > > > > > lot...
> > > > > > >
> > > > > > > HTH
> > > > > > >
> > > > > > > JMS
> > > > > > >
> > > > > > > 2015-12-14 5:14 GMT-05:00 Rajeshkumar J <
> > > rajeshkumarit8292@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > >    Thanks for your response, But you are suggesting
me a tall
> > and
> > > > > > narrow
> > > > > > > > table which is not working for me right now. As my
use case
> > > > involves
> > > > > > > > real-time solution I need to retrieve data from hbase
table
> > > within
> > > > > one
> > > > > > or
> > > > > > > > two seconds. I have tried as you suggested which may
lead to
> > 1000
> > > > > rows
> > > > > > > for
> > > > > > > > a given id which takes more than  a minute in retrieval
> > process.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Rajeshkumar
> > > > > > > >
> > > > > > > > On Mon, Dec 14, 2015 at 3:29 PM, Jean-Marc Spaggiari
<
> > > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > HBase is a key value sotre. So what you are pushing
here
> will
> > > be
> > > > > > stored
> > > > > > > > as:
> > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > >
> > > > > > > > > HOWEVER.... HBase will never split a region withing
a key
> and
> > > > keys
> > > > > > are
> > > > > > > > > always ordered. So at the end, what you will
have exactly
> is:
> > > > > > > > >
> > > > > > > > >
> > --------------------------------------------------------------
> > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > >
> > --------------------------------------------------------------
> > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > >
> > --------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > The only places where HBase will splis are marked
with
> > "-----"
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is if 1002-xxx is your key and "url.com"
is your
> column
> > > > > > > qualifier.
> > > > > > > > >
> > > > > > > > > HTH
> > > > > > > > >
> > > > > > > > > JMS
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2015-12-14 3:39 GMT-05:00 Rajeshkumar J <
> > > > > rajeshkumarit8292@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > >    I am going to use flat-wide tables in
Hbase for my
> > usecase
> > > > > and I
> > > > > > > > have
> > > > > > > > > > some doubts regarding this.
> > > > > > > > > >
> > > > > > > > > >    1. As per my knowledge flat-wide stores
one column
> value
> > > as
> > > > > key
> > > > > > > and
> > > > > > > > > > others as its values in a key-value pair
relationship (
> > > correct
> > > > > me
> > > > > > > if I
> > > > > > > > > am
> > > > > > > > > > wrong).
> > > > > > > > > >
> > > > > > > > > > I am having row  as follows
> > > > > > > > > >
> > > > > > > > > > id  | name | url | time
> > > > > > > > > >
> > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I need to store it in flat-wide table as
follows
> > > > > > > > > >
> > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx |
1002 | xxx |
> > > url.com
> > > > |
> > > > > > > > > yy:yy:yy
> > > > > > > > > > |
> > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > >
> > > > > > > > > > How to store it like this?
> > > > > > > > > > Can any on help me in this?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message