polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhuangmz08" <zhuangm...@qq.com>
Subject 回复: 回复: Large Scale Entity Store Database?
Date Tue, 14 Jun 2016 07:47:23 GMT
Yes, a comparison graph would attract more peolple into zest!




------------------ 原始邮件 ------------------
发件人: "Niclas Hedhman";<hedhman@gmail.com>;
发送时间: 2016年6月14日(星期二) 下午3:27
收件人: "dev"<dev@zest.apache.org>; 

主题: Re: 回复: Large Scale Entity Store Database?



I am sorry I don't have any numbers.
But you will probably find that high-performance databases, such as
Cassandra, HBase, MkngoDB and maybe even platinum-hardware-hosted SQL
servers, are capable to outperform the Zest runtime"s ability to construct
the entities into memory, i.e. the serialization is slower.
Some years ago, I vaguely recall that the then Qi4j runtime maxes out
somewhere around 3-5000 reads/second, for relatively simple entities.

For querying, the resultset is an EntityReference collection, in principle
the identity of the Entity which is then read from the ES.

Querying in Zest is done via a Fluent API (DSL if you like), which in a
typesafe manner describes the query. The query subsystem translates that
into the underlying query system's native language and executes the query.
Of course, the query is translated according to the same subsystem's
indexing algorithm and there might be room for clever work in making this
faster.

Again, I don't have the numbers, but gut feeling guess is that it is order
of magnitude(s) slower than direct lookup in a fast ES.

There are (or used to be) some performance tests available in the source
code somewhere, for ES testing.
I would be delighted if that could be automated so a comparison (table or
graph) can be auto-published by the CI build system.

Maybe not as much help as you hoped for.

Niclas
On Jun 14, 2016 12:24, "zhuangmz08" <zhuangmz08@qq.com> wrote:

Hi,


OK, writing entities and reading entities are separated both theroy and
physical implementation.


1. It's acceptable to occupy large storage space (Disk is cheap).
All entities are stored in a SINGLE table of the SQL database or in a
SINGLE collection of the SINGLE database in Mongo.
What's the key factors on writing? Which MapEntityStore is faster in
writing entities? I mean, which is better for production use.


2. Reading speed is related to the Indexer?  I know something about search
engine (Apache Solr). Could you explain more about the querying. When the
query string matched some index, how will they interact with the entity
database? Do we need to query the Entity database internally? I would like
to know the factors impacting read speed.
Which is better for production use, OpenRDF or ElasticSearch?


Thanks a lot.


------------------ 原始邮件 ------------------
发件人: "Niclas Hedhman";<hedhman@gmail.com>;
发送时间: 2016年6月14日(星期二) 中午11:02
收件人: "dev"<dev@zest.apache.org>;

主题: Re: Large Scale Entity Store Database?



In Zest, storage/retrieval and indexing/query are separated concerns. (Disk
is cheap)
Just like it is on the world-wide web.

Now, the relatively simple Entity Stores that are based on the
MapEntitStore might be particularly wasteful with storage space, depending
on the underlying engine. However, nothing stops you from creating a
"native" ES for your favorite storage engine.

The Indexing/Query systems are much more complex (compare a website's
store/retrieve with Google's Search) and it is not trivial to make an
indexing extension that is complete (native queries are available as a
compromise).

In Zest 2.x and earlier, the default is to index all properties, and you
can turn some of them off. In 3.x we intend to change the default to off,
and you indicate what needs indexing.

Final note, the requirements on the entity stores are that any "unknown"
state is preserved so that an update will not modify such state. This is
due to the fact that entities of the same identity can have more than one
(possibly incompatible) type. This complicates traditional ORM techniques
quite a bit.

Cheers
Niclas
On Jun 14, 2016 09:06, "zhuangmz08" <zhuangmz08@qq.com> wrote:

> Hi, I dig into the Postgres table, and I find that entities are actually
> stored as JSON-format strings, which seems to use SQL database as a
> Document database. I'm wondering how efficient queries are achieved? I'm
> going to insert and query millions of entities. Have you ever tested the
> performance? Should I use Mongo-support Entity Store instead? Thanks a
lot.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message