hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Flat-wide table Hbase
Date Thu, 17 Dec 2015 13:23:47 GMT
You will need to batch your puts of use th bulkloading...

http://hbase.apache.org/book.html#arch.bulk.load

JMS

2015-12-17 0:35 GMT-05:00 Rajeshkumar J <rajeshkumarit8292@gmail.com>:

> Hi,
>
>    I have my input file sized 1gb in HDFS. I am inserting it into hbase
> table and also generating column qualifiers name dynamically. And I am
> doing this in java and it took about 20 hours to finish this process. Can
> any one help me in optimize the below java code
>
>
> public class HbaseinsertionScan {
>
>     //As I am generating column names dynamically also I need column names
> to be in certain order so I decidedto have column name as combination of
> numerical value  and below alphabets
>     public static String[] ColumnNames = {"a", "aa", "b", "bb", "c", "cc",
> "d", "dd", "e", "ee", "f", "ff", "g", "gg", "h", "hh", "i", "j", "k", "l",
> "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"};
>
>
>     public static void main(String[] args) throws IOException {
>
>         // Initializing req classes
>         BufferedReader br = null;
>         Configuration conf = new Configuration();
>
>         Configuration config = HBaseConfiguration.create();
>
>    Scan scan = new Scan();
>
>         // creating object for tables
> //table 1 which holds actual data
>         HTable table1 = new HTable(config, "table1");
>
> //table 2 holds numerical value for each rowkey
>         HTable table2 = new HTable(config, "table2");
>
>
>         // Reading the files from HDFS and inserting it into Hbase Tabls
>
>         FileSystem fs = FileSystem.get(conf);
>         FileStatus[] status = fs.listStatus(new Path("/filename"));
>         for (int i = 0; i < status.length; i++) {
>             br = new BufferedReader(new
> InputStreamReader(fs.open(status[i].getPath())));
>             String line = "";
>
>             while ((line = br.readLine()) != null) {
>
>                 String[] colvalues = line.split(",");
>
>                 //Generating column names dynamically
>                 String colno = "199999";
>
>                 RowFilter filter = new
> RowFilter(CompareFilter.CompareOp.EQUAL, new
> BinaryComparator(Bytes.toBytes(colvalues[1])));
>                 scan.setFilter(filter);
>                 ResultScanner scanner = table2.getScanner(scan);
>
>                 for (Result result : scanner) {
>
>                     for (KeyValue kv : result.raw()) {
>
>                         colno = Bytes.toString(kv.getValue());
>                         long newval = Long.valueOf(colno)-1;
>                         System.out.println(newval);
>                         colno = String.valueOf(newval);
>                     }
>                 }
>                 for (int index = 0; index < ColumnNames.length; index++) {
>
>                     // Adding rowkey
>                     Put p = new Put(Bytes.toBytes(colvalues[1]));
>
>                     // Adding column family name, qualifier name ,value
>                     p.add(Bytes.toBytes("test_family"),
>                             Bytes.toBytes(colno + ColumnNames[index]),
> Bytes.toBytes(colvalues[index]));
>
>                     table1.put(p);
>
>                     Put col = new Put(Bytes.toBytes(colvalues[1]));
>
>                     // Adding column family name, qualifier name ,value
>                     col.add(Bytes.toBytes("test_family"),
>                             Bytes.toBytes("randomno"),
> Bytes.toBytes(colno));
>
>                     table2.put(col);
>
>                     //System.out.println("row inserted");
>                 }
>
>             }
>
>         }
>         hTable.close();
>         coltable.close();
>         br.close();
>
>     }
>
>
> }
>
> On Mon, Dec 14, 2015 at 9:35 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Rajesh,
> >
> > For the column qualifier, there is no need to "create" them in advance.
> > Just setup what ever you want when you build your Put and HBase will take
> > it...
> >
> > JMS
> >
> > 2015-12-14 6:05 GMT-05:00 Rajeshkumar J <rajeshkumarit8292@gmail.com>:
> >
> > > Hi
> > >
> > >    Thanks. This is what I need and I am considering this as flat-wide
> > table
> > > approach.
> > >
> > >    I have some doubts and first of them is how to create dynamic column
> > > qualifiers. Do you know the command or any other sites which is useful
> > for
> > > this approach.
> > >
> > > Thanks
> > >
> > > On Mon, Dec 14, 2015 at 4:28 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > That is correct. As long as ths column qualifier is different. But
> they
> > > > will still go on the same region and after compactions will end up in
> > the
> > > > same file.
> > > >
> > > > JMS
> > > >
> > > > JMS
> > > > Le 2015-12-14 6:55 AM, "Rajeshkumar J" <rajeshkumarit8292@gmail.com>
> a
> > > > écrit :
> > > >
> > > > > Hi,
> > > > >
> > > > >    So as per your reply inserting second row will not update the
> > > existing
> > > > > row-key and it will add as new column qualifiers to the existing
> > > row-key
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Mon, Dec 14, 2015 at 4:13 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > When you will insert the 2nd row, HBase wil just add is after
the
> > > first
> > > > > > one. On the storage side it will be another key/value entry
AFTER
> > the
> > > > > first
> > > > > > one. On the conceptual view,, it will be seen as anothe column
> for
> > > the
> > > > > same
> > > > > > row (wide approach). HBase will not update the previous existing
> > > > entry. I
> > > > > > will create a new one for the new key/value. The 1002-xxx |
> > url.com
> > > > that
> > > > > > you have insterted before will not be touched.
> > > > > >
> > > > > > you have to see all those key/values are totally independent.
If
> > they
> > > > > have
> > > > > > a different column name, what you do with one will have not
any
> > > impact
> > > > on
> > > > > > the others.
> > > > > >
> > > > > > JMS
> > > > > >
> > > > > > 2015-12-14 5:37 GMT-05:00 Rajeshkumar J <
> > rajeshkumarit8292@gmail.com
> > > >:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > >   Thanks for your response but in your previous answer
you have
> > > > > mentioned
> > > > > > > as follows
> > > > > > >
> > > > > > >
> > > > > > > --------------------------------------------------------------
> > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > --------------------------------------------------------------
> > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > --------------------------------------------------------------
> > > > > > >
> > > > > > > "This is if 1002-xxx is your key and "url.com" is your
column
> > > > > qualifier"
> > > > > > >
> > > > > > > I have input rows as follows
> > > > > > >
> > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > >
> > > > > > > when I insert  first row to my hbase table 1002-xxx will
be
> > > inserted
> > > > as
> > > > > > > rowkey and url.com will be one of my column qualifier
> > > > > > >
> > > > > > > what happens when I try to insert next row i.e., 1002 |
xxx |
> > > > > urrl2.com
> > > > > > |
> > > > > > > zz:zz:zz for this also row-key will be 1002-xxx. As far
as I
> know
> > > > when
> > > > > we
> > > > > > > try to insert same row-key the row will be updated.
> > > > > > >
> > > > > > > what to do for this cases?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Mon, Dec 14, 2015 at 3:49 PM, Jean-Marc Spaggiari <
> > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > >
> > > > > > > > Hi Rajesh,
> > > > > > > >
> > > > > > > > This is not a tall table. Tall will be something whereyou
put
> > > your
> > > > > > domain
> > > > > > > > name on the key, no on the column qualifier. Putting
the
> domain
> > > on
> > > > > the
> > > > > > > > columns means you will have many many columns for
the same
> key.
> > > At
> > > > > the
> > > > > > > end,
> > > > > > > > HBase always stores the key for each and every column,
what
> > ever
> > > it
> > > > > is
> > > > > > > tall
> > > > > > > > or wide.
> > > > > > > >
> > > > > > > > Reading 1000 rows or reading 1000 columns for HBase
is
> exactly
> > > the
> > > > > same
> > > > > > > > thing. The only difference is that between 1000 rows
HBase
> > might
> > > > > split
> > > > > > > the
> > > > > > > > rows into 2 regions. If you have 1000 columns, HBase
will not
> > > split
> > > > > > them.
> > > > > > > >
> > > > > > > > HBase can return a row in few milli seconds. 2 seconds
for
> one
> > > Cell
> > > > > is
> > > > > > a
> > > > > > > > lot...
> > > > > > > >
> > > > > > > > HTH
> > > > > > > >
> > > > > > > > JMS
> > > > > > > >
> > > > > > > > 2015-12-14 5:14 GMT-05:00 Rajeshkumar J <
> > > > rajeshkumarit8292@gmail.com
> > > > > >:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > >    Thanks for your response, But you are suggesting
me a
> tall
> > > and
> > > > > > > narrow
> > > > > > > > > table which is not working for me right now.
As my use case
> > > > > involves
> > > > > > > > > real-time solution I need to retrieve data from
hbase table
> > > > within
> > > > > > one
> > > > > > > or
> > > > > > > > > two seconds. I have tried as you suggested which
may lead
> to
> > > 1000
> > > > > > rows
> > > > > > > > for
> > > > > > > > > a given id which takes more than  a minute in
retrieval
> > > process.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Rajeshkumar
> > > > > > > > >
> > > > > > > > > On Mon, Dec 14, 2015 at 3:29 PM, Jean-Marc Spaggiari
<
> > > > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > HBase is a key value sotre. So what you
are pushing here
> > will
> > > > be
> > > > > > > stored
> > > > > > > > > as:
> > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > >
> > > > > > > > > > HOWEVER.... HBase will never split a region
withing a key
> > and
> > > > > keys
> > > > > > > are
> > > > > > > > > > always ordered. So at the end, what you
will have exactly
> > is:
> > > > > > > > > >
> > > > > > > > > >
> > > --------------------------------------------------------------
> > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > >
> > > --------------------------------------------------------------
> > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > >
> > > --------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > The only places where HBase will splis are
marked with
> > > "-----"
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is if 1002-xxx is your key and "url.com"
is your
> > column
> > > > > > > > qualifier.
> > > > > > > > > >
> > > > > > > > > > HTH
> > > > > > > > > >
> > > > > > > > > > JMS
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2015-12-14 3:39 GMT-05:00 Rajeshkumar J
<
> > > > > > rajeshkumarit8292@gmail.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > >    I am going to use flat-wide tables
in Hbase for my
> > > usecase
> > > > > > and I
> > > > > > > > > have
> > > > > > > > > > > some doubts regarding this.
> > > > > > > > > > >
> > > > > > > > > > >    1. As per my knowledge flat-wide
stores one column
> > value
> > > > as
> > > > > > key
> > > > > > > > and
> > > > > > > > > > > others as its values in a key-value
pair relationship (
> > > > correct
> > > > > > me
> > > > > > > > if I
> > > > > > > > > > am
> > > > > > > > > > > wrong).
> > > > > > > > > > >
> > > > > > > > > > > I am having row  as follows
> > > > > > > > > > >
> > > > > > > > > > > id  | name | url | time
> > > > > > > > > > >
> > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I need to store it in flat-wide table
as follows
> > > > > > > > > > >
> > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
| 1002 | xxx |
> > > > url.com
> > > > > |
> > > > > > > > > > yy:yy:yy
> > > > > > > > > > > |
> > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > >
> > > > > > > > > > > How to store it like this?
> > > > > > > > > > > Can any on help me in this?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message