From general-return-66216-archive-asf-public=cust-asf.ponee.io@incubator.apache.org  Mon Oct 29 06:51:29 2018
Return-Path: <general-return-66216-archive-asf-public=cust-asf.ponee.io@incubator.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 0B3E6180627
	for <archive-asf-public@cust-asf.ponee.io>; Mon, 29 Oct 2018 06:51:27 +0100 (CET)
Received: (qmail 8339 invoked by uid 500); 29 Oct 2018 05:51:21 -0000
Mailing-List: contact general-help@incubator.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:general-help@incubator.apache.org>
List-Unsubscribe: <mailto:general-unsubscribe@incubator.apache.org>
List-Post: <mailto:general@incubator.apache.org>
List-Id: <general.incubator.apache.org>
Reply-To: general@incubator.apache.org
Delivered-To: mailing list general@incubator.apache.org
Received: (qmail 8326 invoked by uid 99); 29 Oct 2018 05:51:21 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2018 05:51:20 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9D1A718EB9B
	for <general@incubator.apache.org>; Mon, 29 Oct 2018 05:51:20 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.112
X-Spam-Level:
X-Spam-Status: No, score=-0.112 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001,
	T_DKIMWL_WL_MED=-0.01] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id W59qqodFDa6X for <general@incubator.apache.org>;
	Mon, 29 Oct 2018 05:51:16 +0000 (UTC)
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3952A5F42F
	for <general@incubator.apache.org>; Mon, 29 Oct 2018 05:51:16 +0000 (UTC)
Received: by mail-wr1-f65.google.com with SMTP id i4-v6so7203657wrr.13
        for <general@incubator.apache.org>; Sun, 28 Oct 2018 22:51:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc:content-transfer-encoding;
        bh=rLMidVZVqDM+6vH8oEJRQOeOkDOUYB2uUD0Hr0isHOI=;
        b=RxdJZKJO0vMCsqKwOh6h+/m5MC+NK2YQwzBeLYlK3kazHLGSH7pbi07TsBlppnvmiR
         BkHxMp2jXTIkmOzMtdkfQ0cd+dvaFUQuUB4ZufOCaTYmGKQ6V4OaBOZje16f2gPxb5WX
         +o4XC8toKM0nXVyyUAW7BGdZ+ufpiYkC64+GL4iXIWDZDJmBWvoIzLKxBnx95qy8ps+H
         fs6XkVn3N7wjKv412uMKxlesphhjoFenUwKUFPelAMC20gX1vwRh5iH46xR8WM2kR3WW
         uWM+59FcZ2JTQWR1Cph3kTTRQLYMYQzhPTgDcMN/PD5WSnHQa2hJggM6Ia3iRNTQqkW7
         R8sw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc:content-transfer-encoding;
        bh=rLMidVZVqDM+6vH8oEJRQOeOkDOUYB2uUD0Hr0isHOI=;
        b=ca1JhtT6VHp4FPwHPJLe2/uKBa00NpGRX9IQzMuMBxGWnHlUcYcKZwapIOTLaybdJV
         /wVZocwkz7tqbWJkFWLKsKKLyRGDrLp/EZdOAvgekNNIenc1GmtwUnk3Ja5Px1K3JRnT
         sOZP+4TRyUsWDbfA/WS3P34eRuz9VqjwHuxI5FvTceU9iIA1QWle3NsuaJFvYSsAYtA3
         s1D/XEb1hApJKXl5LmkMyky5b4Js0U7EftEcIldpVw46L/fPF5ws5F9IDR0QJeA69HWA
         hGsvl8txsoJoULLOw+XRB5QabSppOvzwNOwsmY7noXc/1FBum5mBbbzAuRy6HpWOKd30
         eSGg==
X-Gm-Message-State: AGRZ1gLko8EY5Xlvp2PDgopo5+kFMKWCSTxLt7LIsA8HGJAYXT5m9swn
	nD05HdB8n/+k5FhYocKBWR8mtLdHWQWPFejWhL9NrQ==
X-Google-Smtp-Source: AJdET5cF80431wF5KAU3cKvHda0ZVIz+RosohF6IGk/CMze36u4P1lYv6VkhY0U/L1ClDbM5xmh0l2NewTaK/Pi5Sgc=
X-Received: by 2002:adf:f712:: with SMTP id r18-v6mr13134830wrp.85.1540792275171;
 Sun, 28 Oct 2018 22:51:15 -0700 (PDT)
MIME-Version: 1.0
References: <tencent_E07F8742C6B1A5195AC2E1D90AE323C9D505@qq.com>
In-Reply-To: <tencent_E07F8742C6B1A5195AC2E1D90AE323C9D505@qq.com>
From: Willem Jiang <willem.jiang@gmail.com>
Date: Mon, 29 Oct 2018 13:51:03 +0800
Message-ID: <CA+QaCWK9nXt2=B5srJS4knYhc84CXynujVZVvdS3_ecSYumE8Q@mail.gmail.com>
Subject: Re: [DISCUSS] IoTDB Incubation Proposal
To: general@incubator.apache.org
Cc: csliuyb@qq.com, wang_chen@tsinghua.edu.cn, kmcgrail@apache.org, 
	sainthxd@gmail.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

It's look like a very interesting project. I'd like to be your mentor :)
Please ping me if you have any question about incubating process, I'd
like to share my journey with you.

Willem Jiang

Twitter: willemjiang
Weibo: =E5=A7=9C=E5=AE=81willem

On Mon, Oct 29, 2018 at 8:35 AM Xiangdong Huang <hxdreg@qq.com> wrote:
>
> Dear Apache Incubator Community,
>
>
> I would like to open up a discussion about incubating IoTDB at Apache. Io=
TDB is a database for managing large amounts of time series data  from IoT =
sensors in industrial applications.
>
>
> The proposal is available as a draft at https://wiki.apache.org/incubator=
/IoTDBProposal . I have also included the text of the proposal below.
>
>
>
>
> =3D IoTDB Proposal  =3D
> v0.1
>
>
> =3D=3D Abstract =3D=3D
> IoTDB is a database for managing large amounts of time series data such a=
s timestamped data from IoT sensors in industrial applications.
>
>
> =3D=3D Proposal =3D=3D
> IoTDB is a database for managing large amount of time series data with co=
lumnar storage, data encoding, pre-computation, and index techniques. It ha=
s SQL-like interface to write millions of data points per second per node a=
nd is optimized to get query results in few seconds over trillions of data =
points. It can also be easily integrated with Apache Hadoop MapReduce and A=
pache Spark for analytics.
>
>
> =3D=3D Background =3D=3D
>
>
> A new class of data management system requirements is becoming increasing=
ly important with the rise of the Internet of Things. There are some databa=
se systems and technologies aimed at time series data management.  For exam=
ple, Gorilla and InfluxDB which are mainly built for data centers and monit=
oring application metrics. Other systems, for example, OpenTSDB and KairosD=
B, are built on Apache HBase and Apache Cassandra, respectively.
>
>
> However, many applications for time series data management have more requ=
irements especially in industrial applications as follows:
>
>
>  * Supporting time series data which has high data frequency. For example=
, a turbine engine may generate 1000 points per second (i.e., 1000Hz), whil=
e each CPU only reports 1 data points per 5 seconds in a data center monito=
ring application.
>
>
>  * Supporting scanning data multi-resolutionally. For example, aggregatio=
n operation is important for time series data.
>
>
>  * Supporting special queries for time series, such as pattern matching, =
time series segmentation, time-frequency transformation and frequency query=
.
>
>
>  * Supporting a large number of monitoring targets (i.e. time series). An=
 excavator may report more than 1000 time series, for example, revolving sp=
eed of the motor-engine, the speed of the excavator, the accelerated speed,=
 the temperature of the water tank and so on, while a CPU or an application=
 monitor has much fewer time series.
>
>
>  * Optimization for out-of-order data points. In the industrial sector, i=
t is common that equipment sends data using the UDP protocol rather than th=
e TCP protocol. Sometimes, the network connect is unstable and parts of the=
 data will be buffered for later sending.
>
>
>  * Supporting long-term storage. Historical data is precious for equipmen=
t manufacturers. Therefore, removing or unloading historical data is highly=
 desired for most industrial applications. The database system must not onl=
y support fast retrieval of historical data, but also should guarantee that=
 the historical data does not impact the processing speed for =E2=80=9Chot=
=E2=80=9D or current data.
>
>
>  * Supporting online transaction processing (OLTP) as well as complex ana=
lytics. It is obvious that supporting analyzing from the data files using A=
pache Spark/Apache Hadoop MapReduce directly is better than transforming da=
ta files to another file format for Big Data analytics.
>
>
>  * Flexible deployment either on premise or in the cloud.  IoTDB is as si=
mple and can be deployed on a Raspberry Pi handling hundreds of time series=
. Meanwhile, the system can be also deployed in the cloud so that it suppor=
ts tens of millions ingestions per second, OLTP queries in milliseconds, an=
d analytics using Apache Spark/Apache Hadoop MapReduce.
>
>
>  * * (1) If users deploy IoTDB on a device, such as a Raspberry Pi, a win=
d turbine, or a meteorological station, the deployment of the chosen databa=
se is designed to be simple. A device may have hundreds of time series (but=
 less than a thousand time series) and the database needs to handle them.
>  * * (2) When deploying IoTDB in a data center, the computational resourc=
es (i.e., the hardware configuration of servers) is not a problem when comp=
ared to a Raspberry Pi. In this deployment, IoTDB can use more computation =
resources, and has the ability to handle more time seires (e.g., millions o=
f time series).
>
>
> Based on these requirements, we developed IoTDB, a new data store system =
for managing time series data.
>
>
> IoTDB started as a Tsinghua University research project. IoTDB's develope=
r community has also grown to include additional institutions, for example,=
 universities (e.g., Fudan University), research labs (e.g, NEL-BDS lab), a=
nd corporations (e.g., K2Data, Tencent). Funding has been provided by vario=
us institutions including the National Natural Science Foundation of China,=
 and industry sponsors, such as Lenovo and K2Data.
>
>
> =3D=3D Rationale =3D=3D
> Because there is no existed open-sourced time series databases covering a=
ll the above requirements, we developed IoTDB. As the system matures, we ar=
e seeking a long-term home for the project. We believe the Apache Software =
Foundation would be an ideal fit. Also joining Apache will help coordinate =
and improve the development effort of the growing number of organizations w=
hich contribute to IoTDB improving the diversity of our community.
>
>
> IoTDB contains multiple modules, which are classified into categories:
>
>
>  * '''TsFile Format''': TsFile is a new columnar file format.
>  * '''Adaptor for Analytics and Visualization''': Integrating TsFile with=
 Apache Hadoop HDFS, Apache Hadoop MapReduce and Apache Spark. Examples of =
integrating IoTDB with Apache Kafka, Apache Storm and Grafana are also prov=
ided.
>  * '''IoTDB Engine''': An engine which consists of SQL parser, query plan=
 generator, memtable, authentication and authorization,write ahead log (WAL=
), crash recovery, out-of-order data handler, and index for aggregation and=
 pattern matching. The engine stores system data in TsFile format.
>  * '''IoTDB JDBC''': An implementation of Java Database Connectivity (JDB=
C) for clients to connect to IoTDB using Java.
>
>
> =3D=3D=3D TsFile Format =3D=3D=3D
>
>
> TsFile format is a columnar store, which is similar with Apache Parquet a=
nd Apache CarbonData. It has the concepts of Chunk Group, Column Chunk, Pag=
e and Footer. Comparing with Apache Parquet and Apache CarbonData, it is de=
signed and optimized for time series:
>
>
> =3D=3D=3D=3D Time Series Friendly Encoding =3D=3D=3D=3D
> IoTDB currently supports run length encoding (RLE), delta-of-delta encodi=
ng, and Facebook's Gorilla encoding.
>
>
> Lossy encoding methods (e.g., Piecewise Linear Approximation (PLA) and ti=
me-frequency transformation are works-in-progress.
>
>
>
>
> =3D=3D=3D=3D Chunk Group =3D=3D=3D=3D
> The data part of a TsFile consists of many Chunk Groups. Each Chunk Group=
 stores the data of a device at a time interval.  A Chunk Group is similar =
to the row group in Apache Parquet, while there are some constraints of the=
 time dimension:  For each device, the time intervals of different Chunk Gr=
oups are not overlapped and the latter Chunk Group always has a larger time=
stamp.
>
>
> Given a TsFile and a query with a time range filter, the query process ca=
n terminate scanning data once it reads data points whose timestamp reaches=
 the time limit of the filter. We call the feature ''fast-return'' and it m=
akes the time range query in a TsFile very efficient.
>
>
>
>
>
>
> =3D=3D=3D=3D Different Column Chunk Format (Unnecessary the Repetition (R=
) and Definition (D) Fields) =3D=3D=3D=3D
>
>
> While Apache Parquet and Apache CarbonData support complex data types, e.=
g., nested data and sparse columns, TsFile is exclusively designed for time=
 series whose data model is \<device_id, series_id, timestamp, value\>.
>
>
> In a `Chunk Group`, each time series is a `Column Chunk`. Even though the=
se time series belong to the same device, the data points in different time=
 series are not aligned in the time dimension originally.
>
>
> For example, if you have a device with 2 sensors on the same data collect=
ion frequencies, sensor 1 may collect data at time 1521622662000 while the =
other one collects data at time 1521622662001 (delta=3D1ms). Therefore, eac=
h Column Chunk has its timestamps and values, which is quite different from=
 Apache Parquet and Apache CarbonData.  Because we store the time column al=
ong with each value column instead of making different chunks share the sam=
e time column for the sake of diverse data frequency for different time ser=
ies, we do not store any null value on disk to align across time series. Be=
sides, we do not need to attach  `repetition` (R) and `definition` (D) fiel=
ds on each value. Therefore, the disk space is saved and the query latency =
is reduced (because we do not align data by calculating R and D fields).
>
>
>
>
> =3D=3D=3D=3D Domain Specific Information in Each Page =3D=3D=3D=3D
> Similar to Apache Parquet and Apache CarbonData, a `Column Chunk` consist=
s of several `Pages`, and each `Page` has a `Page header`. The `Page header=
` is a summary of the data in the page.
>
>
> Because TsFile is optimized for time series, the page header contains mor=
e domain specific information, such as the minimal and maximal value, the m=
inimal and the maximal timestamp, the frequency and so on. TsFile can even =
store the histogram of values in the page header.
>
>
> This header information helps IoTDB in speeding up queries by skipping un=
necessary pages.
>
>
>
>
> =3D=3D=3D Adaptor for Analytics =3D=3D=3D
> The TsFile provides:
>
>
>  * InputFormat/OutputFormat interfaces for Reading/Writing data.
>  * Deep integration with Apache Spark/Hadoop MapReduce including predicat=
e push-down, column pruning, aggregation push down, etc. So users can use A=
pache Spark SQL/HiveQL to connect and query TsFiles.
>
>
>
>
> =3D=3D=3D IoTDB Engine =3D=3D=3D
> The IoTDB engine is a database engine, which uses TsFile as its storage f=
ile format. The IoTDB Engine supports SQL-like query plus many useful funct=
ions:
>
>
>  * Tree-based time series schema
>  * Log-Structured Merge (LSM)-based storage
>  * Overflow file for out-of-order data
>  * Scalable index framework
>  * Special queries for time series
>
>
> =3D=3D=3D=3D Tree-based Time Series Schema =3D=3D=3D=3D
> IoTDB manages all the time series definitions using a tree structure. A p=
ath from the root of the tree to a leaf node represents a time series. Ther=
efore, the unique id of a time series is a path, e.g., `root.China.beijing.=
windFarm1.windTurbine1.speed`.
>
>
> This kind of schema can express `group by` naturally. For example, `root.=
China.beijing.windFarm1.*.speed` represents the speed of all the wind turbi=
nes in wind farm 1 in Beijing, China.
>
>
> =3D=3D=3D=3D Log-Structured Merge (LSM)-based Storage =3D=3D=3D=3D
> In a time series, the data points should be ordered by their timestamps. =
In IoTDB, we use Log-Structured Merge (LSM) based mechanism. Therefore, a p=
art of the data is stored in memory first and can be called as `memtable`. =
At this time, if data points come out-of-order, we resort them in memory. W=
hen this part of data exceeds the configured memory limit, we flush it on d=
isk as a `Chunk Group` into an unclosed TsFile.  Finally, a TsFile may cont=
ain several Chunk Groups, for reducing the number of small data files, whic=
h is helpful to reduce the I/O load of the storage system and reduces the e=
xecution time of a file-merge in LSM. Notice that the data is time-ordered =
in one Chunk Group on disk, and this layout is helpful for fast filtering i=
n one Chunk Group for a query.
>
>
> Rule 1: In a TsFile, the Chunk Groups of one device are ordered by timest=
amp (Rule 1), and it is helpful for fast filtering among Chunk Groups for a=
 query.
>
>
> Rule 2: When the size of the unclosed TsFile reaches the threshold define=
d in the configuration file, we close the file and generate a new one to st=
ore new arriving data spanning the entire data set. Like many systems which=
 use LSM-based storage, we never modify a TsFile which has been closed exce=
pt for the file-merge process (Rule 2).
>
>
> Rule 3: To reduce the number of TsFiles involved in a query process, we g=
uarantee that the data points in different TsFiles are not overlapping on t=
he time dimension after file mergence (Rule 3).
>
>
> =3D=3D=3D=3D Overflow File for Out-of-order Data =3D=3D=3D=3D
> When a part of data is flushed on disk (and will form a `Chunk Group` in =
a TsFile), the newly arriving data points whose timestamps are smaller than=
 the largest timestamp in the Tsfile are `out-of-order`.
>
>
> To store the out-of-order data, we organize all the troublesome `out-of-o=
rder` data point insertions into a special TsFile, named `UnSequenceTsFile`=
. In an UnSequenceTsFile, the Chunk Groups of one device may be overlapping=
 in the time dimension, which violates the Rule 1 and costs additional time=
 compared to a normal TsFile for query filtering.
>
> There is another special operation: updating all the data points in a tim=
e range, e.g., `update all the speed values of device1 as 0 where the data =
time is in [1521622000000, 1521622662000]`. The operation is called when: (=
1) a sensor malfunctions and the database receives wrong data for a period;=
 (2) we may want to reset all the records. Many NoSQL time series databases=
 do not support such an operation. To support the operation in IoTDB, we us=
e a tree-based structure, Treap, to store this part of operations and store=
 them as `Overflow` files.
>
>
> Therefore, there are 3 kinds of data files: TsFiles, UnSequenceTsFiles an=
d Overflow files.  TsFiles should store most of the data. The volume of UnS=
equenceTsFiles depends on the workload: if there are too many out-of-order =
and the time span of out-of-order is huge, the volume will be large. Overfl=
ow files handle fewest data operations but will depend on the use of the sp=
ecial operations.
>
>
> =3D=3D=3D=3D LSM-tree =3D=3D=3D=3D
> Normally, LSM-based storage engines merge data files level by level so th=
at it looks like a tree structure. In this way, data is well organized. The=
 disadvantage is that data will be read and written several times. If the t=
ree has 4 levels, each data point will be rewritten at least 4 times.
>
>
> Currently, we do not merge all the TsFiles into one because (1) the numbe=
r of TsFiles is kept lower than many LSM storage engines because a memtable=
 is mapped to several Chunk Groups rather than a file; (2) different TsFile=
s are not overlapping with each other in the time dimension (because of Rul=
e 3).
>
>
> As mentioned before,  TsFile supports ''fast-return'' to accelerate queri=
es. However, UnSequenceTsFile and Overflow files do not allow this feature.=
 The time spans of UnSequenceTsFile, Overflow file andTsFile may be overlap=
ped, which leads to more files involved in the query process. To accelerate=
 these queries, there is a merging process to reorganize files in the backg=
round. All the three kinds of files: TsFiles, UnSequenceTsFiles and Overflo=
w files, are involved in the merging process. The merging process is implem=
ented using multi-threading, while each thread is responsible for a series =
family.
> After merging, only TsFiles are left. These files have non-overlapping ti=
me spans and support the ''fast-return'' feature.
>
>
> =3D=3D=3D=3D Scalable Index Framework =3D=3D=3D=3D
> We allow users to implement indexes for faster queries. We currently supp=
ort an index for pattern matching query (KV-Match index, ICDE 2019). Anothe=
r index for fast aggregation (PISA index, CIKM 2016) is a work-in-progress.
>
>
> =3D=3D=3D=3D Special Queries =3D=3D=3D=3D
> We currently support `group by time interval` aggregation queries and `Fi=
ll by` operations, which are similar to those of InfluxDB. Time series segm=
entation operations and frequency queries are work-in-progress.
>
>
> =3D=3D Initial Goals =3D=3D
> The initial goals are to be open sourced and to integrate with the Apache=
 development process. Furthermore, we plan for incremental development, and=
 releases along with the Apache guidelines.
>
>
> =3D=3D Current Status =3D=3D
> We have developed the system for more than 2 years. There are currently 1=
3k lines of code, some of which are generated by Antlr3 and Thrift.  There =
are 230 issues which have been solved and more than 1500 commits.
>
>
> The system has been deployed in the staging environment of the State Grid=
 Corporation of China to handle ~3 million time series (i.e, ~30,000 power =
generation assembly * ~100 sensors) and an equipment service company in Chi=
na managing ~2 million time series (i.e, ~20k devices * 100 sensors). The i=
nsertion speed reaches ~2 million points/second/node, which is faster than =
InfluxDB, OpenTSDB and Apache Cassandra in our environment.
>
>
> There are many new features in the works including those mentioned herein=
. We will add more analytics functions, improve the data file merge process=
, and finish the first released version of IoTDB.
>
>
> =3D=3D Meritocracy =3D=3D
> The IoTDB project operates on meritocratic principles. Developers who sub=
mit more code with higher quality earn more merit. We have used `Issues` an=
d `Pull Requests` modules on Github for collecting users' suggestions and p=
atches. Users who submit issues, pull requests, documents and help the comm=
unity management are welcomed and encouraged to become committers.
>
>
> =3D=3D Community =3D=3D
>
>
> The IoTDB project users communicate on Github (https://github.com/thulab/=
tsfile) . Developers make the communication on a website which is similar w=
ith JIRA (Currently, only registered users can apply to access the project =
for communication, url: https://tower.im/projects/36de8571a0ff4833ae9d7f1c5=
c400c22/). We have also introduced IoTDB at many technical conferences. Nex=
t, we will build the mailing list for more convenience, broader communicati=
on and archived discussions.
>
>
> If IoTDB is accepted for incubation at the Apache Software Foundation, th=
e primary goal is to build a larger community. We believe that IoTDB will b=
ecome a key project for time series data management, and so, we will rely o=
n a large community of users and developers.
>
>
> TODO: IoTDB is currently on a private Github repository (https://github.c=
om/thulab/iotdb), while its subproject TsFile (a file format for storing ti=
me series data) is open sourced on Github (https://github.com/thulab/tsfile=
).
>
>
> =3D=3D Core Developers =3D=3D
> IoTDB was initially developed by 2 dozen of students and teachers at Tsin=
ghua University. Now, more and more developers have joined coming from othe=
r universities: Fudan University, Northwestern Polytechnical University and=
 Harbin Institute of Technology in China.  Other developers come from busin=
ess companies such as Lenovo and Microsoft. We will be working to bring mor=
e and more developers into the project making contributions to IoTDB.
>
>
> =3D=3D Relationships with Other Apache Products =3D=3D
> IoTDB requires some Apache products (Apache Thrift, commons, collections,=
 httpclient).
>
>
> IoTDB-Spark-connector and IoTDB-Hadoop-connector have been developed for =
supporting analysing time series data by using Apache Spark and MapReduce.
>
>
> Overall, IoTDB is designed as an open architecture, and it can be integra=
ted with many other systems in the future.
>
>
> As mentioned before, in the IoTDB project, we designed a new columnar fil=
e format, called TsFile, which is similar to Apache Parquet. However, the n=
ew file format is optimized for time series data.
>
>
>
>
>
>
> =3D=3D Known Risks =3D=3D
>
>
> =3D=3D=3D Orphaned Products =3D=3D=3D
> Given the current level of investment in IoTDB, the risk of the project b=
eing abandoned is minimal. Time series data is more and more important and =
there are several constituents who are highly inspired to continue developm=
ent. Tsinghua and NEL-BDS Lab relies on IoTDB as a platform for a large num=
ber of long-term research projects. We have deployed IoTDB in some company'=
s staging environments for future applications.
>
>
> =3D=3D=3D Inexperience with Open Source =3D=3D=3D
> Students and researchers in Tsinghua University have been developing and =
using open source software for a long time. It is wonderful to be guided to=
 join a formal open-source process for students. Some of our committers
> have  experiences contributing to open source, for example:
>
>
>  * druid: https://github.com/druid-io/druid/commit/f18cc5df97e5826c2dd8ff=
afba9fcb69d10a4d44
>  * druid: https://github.com/druid-io/druid/commit/aa7aee53ce524b7887b218=
333166941654788794
>  * YCSB: https://github.com/brianfrankcooper/YCSB/pull/776
>
>
> Additionally, several ASF veterans and industry veterans have agreed to m=
entor the project and are listed in this proposal. The project will rely on=
 their guidance and collective wisdom to quickly transition the entire team=
 of initial committers towards practicing the Apache Way.
>
>
>
>
> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D
> Most of current developers are students and researchers/professors in uni=
versities, and their researches focus on big data management and analytics.=
 It is unlikely that they will change their research focus away from big da=
ta management.  We will work to ensure that the ability for the project to =
continuously be stewarded and to proceed forward independent of salaried de=
velopers is continued.
>
>
> =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D
> Most of the initial developers come from Tsinghua University with no inte=
nt to use the Apache brand for profit. We have no plans for making use of A=
pache brand in press releases nor posting billboards advertising acceptance=
 of IoTDB into Apache Incubator.
>
>
>
>
> =3D=3D Initial Source =3D=3D
> IoTDB's github address and some required dependencies:
>
>
>  * The storage file format: https://github.com/thulab/tsfile
>  * Adaptor for Apache Hadoop MapReduce: https://github.com/thulab/tsfile-=
hadoop-connector
>  * Adaptor for Apache Spark: https://github.com/thulab/tsfile-spark-conne=
ctor
>  * Adaptor for Grafana: https://github.com/thulab/iotdb-grafana
>  * The database engine: https://github.com/thulab/iotdb (private project =
up to now)
>  * The client driver: https://github.com/thulab/iotdb-jdbc
>
>
>
>
> =3D=3D=3D External Dependencies =3D=3D=3D
> To the best of our knowledge, all dependencies of IoTDB are distributed u=
nder Apache compatible licenses. Upon acceptance to the incubator, we would=
 begin a thorough analysis of all transitive dependencies to verify this fa=
ct and introduce license checking into the build and release process.
>
>
> =3D=3D Documentation =3D=3D
>  * Documentation for TsFile: https://github.com/thulab/tsfile/wiki
>  * Documentation for IoTDB and its JDBC:  http://tsfile.org/document (Chi=
nese only. An English version is in progress.)
>
>
> =3D=3D Required Resources =3D=3D
> =3D=3D=3D Mailing Lists =3D=3D=3D
>  * private@iotdb.incubator.apache.org
>  * dev@iotdb.incubator.apache.org
>  * commits@iotdb.incubator.apache.org
>
>
> =3D=3D=3D Git Repositories =3D=3D=3D
>  * https://git-wip-us.apache.org/repos/asf/incubator-iotdb.git
>
>
> =3D=3D=3D Issue Tracking =3D=3D=3D
>  *  JIRA IoTDB (We currently use the issue management provided by Github =
to track issues.)
>
>
>
>
> =3D=3D Initial Committers =3D=3D
> Tsinghua University, K2Data Company, Lenovo, Fundan University, Microsoft
>
>
> Jianmin Wang ( jimwang at tsinghua dot edu dot cn )
>
>
> Jun Yuan (richard_yuan16 at 163 dot com  )
>
>
> Chen Wang ( wang_chen at tsinghua dot edu dot cn)
>
>
> Xiangdong Huang (sainthxd at gmail dot com)
>
>
> Jialin Qiao (qjl16 at mails dot tsinghua dot edu dot cn)
>
>
> Jinrui Zhang (jinrzhan at microsoft dot com)
>
>
> Rong Kang (kr11 at mails dot tsinghua dot edu dot cn)
>
>
> Tian Jiang=EF=BC=88jiangtia18 at mails dot tsinghua dot edu dot cn=EF=BC=
=89
>
>
> Lei Rui (rl18 at mails dot tsinghua dot edu dot cn)
>
>
> Rui Liu (liur17 at mails dot tsinghua dot edu dot cn)
>
>
> Kun Liu (liukun16 at mails dot tsinghua dot edu dot cn)
>
>
> Gaofei Cao (cgf16 at mails dot tsinghua dot edu dot cn)
>
>
> Yi Xu(x-y16 at mails dot tsinghua dot edu dot cn)
>
>
> Xinyi Zhao (xyzhao16 at mails dot tsinghua dot edu dot cn)
>
>
> Dongfang Mao (maodf17 at mails dot tsinghua dot edu dot cn)
>
>
> Tianan Li(lta18 at mails dot tsinghua dot edu dot cn)
>
>
> Yue Su (suy18 at mails dot tsinghua dot edu dot cn)
>
>
> Wangminhao Gou(gwmh18 at mails dot tsinghua dot edu dot cn)
>
>
>
>
> =3D=3D Sponsors =3D=3D
> =3D=3D=3D Champion =3D=3D=3D
> Kevin A. McGrail (kmcgrail@apache.org)
>
>
> =3D=3D=3D Nominated Mentors =3D=3D=3D
> TODO

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org