kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 李书明 <18769721...@163.com>
Subject Re:Re: kudu insert select
Date Thu, 28 Apr 2016 01:02:33 GMT
Hi,
I am order by batch query page, not with the same amount of data (75 billion) than impala+hsfs
(partition 15000 partition) quick query, kudu 75 billion data, 110 tablets batch query within
5 seconds out.




Here is my asynchronous add data code, you see if there is a problem?

 AsyncKuduClient client=new AsyncKuduClient.AsyncKuduClientBuilder("node0:7051").build();

        try {
            long startTime = System.currentTimeMillis();
            KuduTable kuduTable=client.openTable("pw_low_e_mp_vol_curve_raw_kudu").join(5000);
            List<ColumnSchema> columnSchemas=new ArrayList<ColumnSchema>(7);
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("cons_id", Type.STRING).key(true).build());
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("data_type", Type.STRING).key(true).build());
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("data_date", Type.STRING).key(true).build());
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("org_no", Type.STRING).key(true).build());
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("meter_id", Type.STRING).key(true).build());
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("data_flag", Type.STRING).key(true).build());
            columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder("valuess", Type.STRING).key(true).build());
            AsyncKuduSession session=client.newSession();
            session.setFlushMode(AsyncKuduSession.FlushMode.AUTO_FLUSH_SYNC);
            session.setFlushInterval(10000);
            session.setMutationBufferSpace(10000);
            session.flush().join(50000);
            for (int i=1;i<1000000;i++){
                Insert insert=kuduTable.newInsert();
                PartialRow row=insert.getRow();
                row.addString(0,i+"");
                row.addString(1,"0");
                row.addString(2,"2015-04-30");
                row.addString(3,"37402");
                row.addString(4,"1234567889");
                row.addString(5,"111111111111111111111111111");
                row.addString(6,"10");
                session.apply(insert);
            }
            long endTime = System.currentTimeMillis();
            System.out.println("haoshi======= "+(endTime-startTime));
        } catch (Exception e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings
| File Templates.
        }finally {
            try {
                client.shutdown();
            } catch (Exception e) {
                e.printStackTrace();  //To change body of catch statement use File | Settings
| File Templates.
            }
        }





thanks!

On Mon, Apr 25, 2016 at 5:44 PM, 李书明 <18769721812@163.com> wrote:

>
> Hi,
>
> You said 112 hash bucket, but the amount of data is one day the increment
> of about 100 million, to about two years of data, the total data volume of
> about 7 100 million, so query speed too slow.
>

Why is the query speed slow? What tests did you perform? Having more than
10-20 tablets per server should not show a big speed improvement.


> There is an asynchronous insert is always lost data, synchronous insertion
> is no problem, the log has not been reported to the wrong.
>

Can you share the code you are using for async insert? Maybe you are not
properly checking errors?

-Todd


> Thanks.
>
>
>
>
>
>
>
> Hi, in particular you may have too many hash buckets.  Try creating the
> table with more like 112 hash buckets, and see if insert performance
> improves.
>
> - Dan
>
> On Mon, Apr 25, 2016 at 10:19 AM, Dan Burkert <dan@cloudera.com> wrote:
>
> > Hi,
> >
> > On Mon, Apr 25, 2016 at 10:14 AM, Misty Stanley-Jones <
> > mstanleyjones@cloudera.com> wrote:
> >
> >> This is one of our most frequently asked questions. Make sure that your
> >> table is created with a schema that will spread the data evenly among
> >> tablets and make sure that you have a number of tablets that is a multiple
> >> of the number of tablet servers. See
> >> http://getkudu.io/docs/schema_design.html and
> >> http://getkudu.io/docs/kudu_impala_integration.html#kudu_impala_create_table
> >> .
> >>
> >> Thanks,
> >> Misty
> >>
> >> On Sun, Apr 24, 2016 at 11:38 PM, 李书明 <18769721812@163.com> wrote:
> >>
> >>> HI
> >>> 14 nodes with API Java insertion speed is only 3000 per second, how to
> >>> improve the insertion rate?
> >>>
> >>> Create table use DISTRIBUTE BY HASH (id) INTO 5000 BUCKETS, error is “kuduRpc
> method=IsCreateTableDone timeout=10000”,How to solve
> >>>
> >>>
> >>> thanks!
>
>
>
>
>
>
>









 






 
Mime
View raw message