rya-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Puja Valiyil <puja...@gmail.com>
Subject Re: Regarding Slow ingest speed while using rya api
Date Mon, 25 Jul 2016 13:00:36 GMT
Hi Pranav,
There are three optimizations you can make to speed up ingest speeds:
1.  Bypass the SAIL layer and ingest data through MapReduce
2.  Turn off flushing on the AccumuloRyaDAO so that it does not flush after
each triple.  This is done through setting flush to false on the
AccumuloRdfConfiguration (
https://github.com/apache/incubator-rya/blob/develop/dao/accumulo.rya/src/main/java/mvm/rya/accumulo/AccumuloRdfConfiguration.java#L107).

3.  Turn on prefix hashing to help reduce hot spotting.
You can also look at pre-splitting your tables to improve ingest
performance.  I think #2 is your best bet.  Let us know if you need any
more help.

On Mon, Jul 25, 2016 at 7:02 AM, pranav.puri <pranav.puri@orkash.com> wrote:

> Dear All
>
> I am getting very slow ingest speed (around 150 entries/second per
> table)while ingesting using code mentioned in the documentation for Direct
> OpenRDF API.
> I am currently using a three node accumulo cluster which consistently
> deliver ingest speed of 1,00,000/second .
>
> Please suggest some ways to improve the ingestion speed as I am working
> with very large dataset.
>
> Regards
> Pranav
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message