incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail" <kmcgr...@apache.org>
Subject Re: [Result][Vote] vote for IoTDB incubation proposal
Date Thu, 15 Nov 2018 17:24:23 GMT
I will defer the intake of code to the secretary.

On Thu, Nov 15, 2018, 12:20 黄向东 <sainthxd@gmail.com wrote:

> > - When you say "open source" repo, do you mean private repo vs public
> > repo?
>
> Yes.
>
> >
> > - I believe Craig as Secretary will say an SGA never hurts but isn't
> > everything already licensed ASLv2?  It's been a few weeks and a few
> > proposals reviewed so it could be my memory.
>
> Currently, the licenses of the dependency libs of IoTDB includes:
> Apache2.0, BSD (antlr3), EPL1.0 (logback) and EPL2.0 (junit).
> We are working on checking all the licenses once again for avoiding
> mistakes.
>
> Regards,
> Xiangdong Huang
>
>
> > 在 2018年11月15日,下午10:43,Kevin A. McGrail <kmcgrail@apache.org>
写道:
> >
> > Well, first, let's ask some questions:
> >
> > - When you say "open source" repo, do you mean private repo vs public
> > repo?
> >
> > - I believe Craig as Secretary will say an SGA never hurts but isn't
> > everything already licensed ASLv2?  It's been a few weeks and a few
> > proposals reviewed so it could be my memory.
> >
> > Regards,
> > KAM
> >
> > --
> > Kevin A. McGrail
> > VP Fundraising, Apache Software Foundation
> > Chair Emeritus Apache SpamAssassin Project
> > https://www.linkedin.com/in/kmcgrail - 703.798.0171
> >
> >
> > On Thu, Nov 15, 2018 at 7:27 AM hxd <hxdreg@qq.com> wrote:
> >
> >> Currently, there are 6 repositories (IoTDB, IoTDB-JDBC, TsFile,
> >> Spark-Connector, Hive-Connector, and Grafana-Connector) totally and we
> will
> >> merge them all in one repositories.
> >>
> >> Only the first one is private.
> >>
> >> Actually we are lack of experiences about how to open source.
> >>
> >> Should we open all the source now or after all the Apache legal
> documents
> >> are done?
> >>
> >> Best,
> >>
> >> Xiangdong Huang
> >>
> >>> 在 2018年11月15日,下午5:06,Willem Jiang <willem.jiang@gmail.com>
写道:
> >>>
> >>> Here is a question for the source code repository
> >>>
> >>> The main source git repo[1] is still a private repo.  I think we need
> >>> to open source the repo before sending the SGA?
> >>>
> >>>
> >>> [1]https://github.com/thulab/iotdb
> >>>
> >>> Willem Jiang
> >>>
> >>> Twitter: willemjiang
> >>> Weibo: 姜宁willem
> >>> On Thu, Nov 15, 2018 at 4:08 PM hxd <hxdreg@qq.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> In the proposal discussion process, we got 3 mentors,  Justin Mclean,
> >> Christofer Dutz, and Willem Ning Jiang.
> >>>>
> >>>> In the vote process, we got a new mentor, Joe Witt.
> >>>>
> >>>> Totally, there are one Champion and four mentors, they are:
> >>>>
> >>>> Kevin A. McGrail (the Champion),
> >>>> Justin Mclean,
> >>>> Christofer Dutz,
> >>>> Willem Ning Jiang, and
> >>>> Joe Witt
> >>>>
> >>>> I have checked their name on
> >> http://people.apache.org/committer-index.html, and they are accurate
> now.
> >>>> The name list on the proposal list (
> >> https://wiki.apache.org/incubator/IoTDBProposal) is also correct.
> >>>>
> >>>> Regards,
> >>>> Xiangdong Huang
> >>>>
> >>>>
> >>>>
> >>>> 在 2018年11月15日,上午12:51,Kevin A. McGrail <kmcgrail@apache.org>
写道:
> >>>>
> >>>> Congratulations!  As champion, I think the next steps are:
> >>>>
> >>>> 1 - Xiangdong, Can you confirm the list of mentors on the proposal is
> >> accurate?
> >>>>
> >>>> 2 - Also Xiangdong, Is there anyone else that stepped forward as a
> >> mentor during the voting process that the project wants the IPMC to
> approve?
> >>>>
> >>>> 3 - Justin, I think you have to request the creation of the podling
> and
> >> then I as champion work on things like the meta data file from this
> page,
> >>>> https://incubator.apache.org/policy/incubation.html, correct?
> >>>>
> >>>> Regards,
> >>>> KAM
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Kevin A. McGrail
> >>>> VP Fundraising, Apache Software Foundation
> >>>> Chair Emeritus Apache SpamAssassin Project
> >>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> >>>>
> >>>>
> >>>> On Wed, Nov 14, 2018 at 6:29 AM hxd <hxdreg@qq.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> With 8 +1 binding votes,  2 +1 non-binding votes and No +/-0 or
-1
> >> votes, this VOTE passes.
> >>>>>
> >>>>> Thanks to everyone who voted!
> >>>>>
> >>>>> Bellow is a voting tally:
> >>>>>
> >>>>> Binding
> >>>>> Von Gosling
> >>>>> Christofer Dutz
> >>>>> Kevin A. McGrail
> >>>>> Felix Cheung
> >>>>> Matt Sticker
> >>>>> Joe Witt
> >>>>> Justin Mclean
> >>>>> Willem Jiang
> >>>>>
> >>>>>
> >>>>> Non-binding
> >>>>> Sheng Wu
> >>>>> Yang Bo
> >>>>>
> >>>>> The vote thread:
> >>
> https://lists.apache.org/thread.html/077f029ab2b52a2b19fc8d41c07438f660a8e93dd87b3895d262263c@%3Cgeneral.incubator.apache.org%3E
> >> <
> >>
> https://lists.apache.org/thread.html/077f029ab2b52a2b19fc8d41c07438f660a8e93dd87b3895d262263c@%3Cgeneral.incubator.apache.org%3E
> >>>
> >>>>> The proposal: https://wiki.apache.org/incubator/IoTDBProposal <
> >> https://wiki.apache.org/incubator/IoTDBProposal>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Xiangdong Huang
> >>>>>
> >>>>>
> >>>>>> 在 2018年11月7日,下午3:46,hxd <hxdreg@qq.com>
写道:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Sorry for the previous mail with bad format.
> >>>>>> I'd like to call a VOTE to accept IoTDB project, a database
for
> >> managing large amounts of time series data  from IoT sensors in
> industrial
> >> applications, into the Apache Incubator.
> >>>>>> The full proposal is available on the wiki:
> >> https://wiki.apache.org/incubator/IoTDBProposal
> >>>>>> and it is also attached below for your convenience.
> >>>>>>
> >>>>>> Please cast your vote:
> >>>>>>
> >>>>>> [ ] +1, bring IoTDB into Incubator
> >>>>>> [ ] +0, I don't care either way,
> >>>>>> [ ] -1, do not bring IoTDB into Incubator, because...
> >>>>>>
> >>>>>> The vote will open at least for 72 hours.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Xiangdong Huang.
> >>>>>>
> >>>>>>
> >>>>>> = IoTDB Proposal  =
> >>>>>> v0.1.1
> >>>>>>
> >>>>>>
> >>>>>> == Abstract ==
> >>>>>> IoTDB is a data store for managing large amounts of time series
data
> >> such as timestamped data from IoT sensors in industrial applications.
> >>>>>>
> >>>>>> == Proposal ==
> >>>>>> IoTDB is a database for managing large amount of time series
data
> >> with columnar storage, data encoding, pre-computation, and index
> >> techniques. It has SQL-like interface to write millions of data points
> per
> >> second per node and is optimized to get query results in few seconds
> over
> >> trillions of data points. It can also be easily integrated with Apache
> >> Hadoop MapReduce and Apache Spark for analytics.
> >>>>>>
> >>>>>> == Background ==
> >>>>>>
> >>>>>> A new class of data management system requirements is becoming
> >> increasingly important with the rise of the Internet of Things. There
> are
> >> some database systems and technologies aimed at time series data
> >> management.  For example, Gorilla and InfluxDB which are mainly built
> for
> >> data centers and monitoring application metrics. Other systems, for
> >> example, OpenTSDB and KairosDB, are built on Apache HBase and Apache
> >> Cassandra, respectively.
> >>>>>>
> >>>>>> However, many applications for time series data management have
more
> >> requirements especially in industrial applications as follows:
> >>>>>>
> >>>>>> * Supporting time series data which has high data frequency.
For
> >> example, a turbine engine may generate 1000 points per second (i.e.,
> >> 1000Hz), while each CPU only reports 1 data points per 5 seconds in a
> data
> >> center monitoring application.
> >>>>>>
> >>>>>> * Supporting scanning data multi-resolutionally. For example,
> >> aggregation operation is important for time series data.
> >>>>>>
> >>>>>> * Supporting special queries for time series, such as pattern
> >> matching, time series segmentation, time-frequency transformation and
> >> frequency query.
> >>>>>>
> >>>>>> * Supporting a large number of monitoring targets (i.e. time
> series).
> >> An excavator may report more than 1000 time series, for example,
> revolving
> >> speed of the motor-engine, the speed of the excavator, the accelerated
> >> speed, the temperature of the water tank and so on, while a CPU or an
> >> application monitor has much fewer time series.
> >>>>>>
> >>>>>> * Optimization for out-of-order data points. In the industrial
> >> sector, it is common that equipment sends data using the UDP protocol
> >> rather than the TCP protocol. Sometimes, the network connect is unstable
> >> and parts of the data will be buffered for later sending.
> >>>>>>
> >>>>>> * Supporting long-term storage. Historical data is precious
for
> >> equipment manufacturers. Therefore, removing or unloading historical
> data
> >> is highly desired for most industrial applications. The database system
> >> must not only support fast retrieval of historical data, but also should
> >> guarantee that the historical data does not impact the processing speed
> for
> >> “hot” or current data.
> >>>>>>
> >>>>>> * Supporting online transaction processing (OLTP) as well as
complex
> >> analytics. It is obvious that supporting analyzing from the data files
> >> using Apache Spark/Apache Hadoop MapReduce directly is better than
> >> transforming data files to another file format for Big Data analytics.
> >>>>>>
> >>>>>> * Flexible deployment either on premise or in the cloud.  IoTDB
is
> as
> >> simple and can be deployed on a Raspberry Pi handling hundreds of time
> >> series. Meanwhile, the system can be also deployed in the cloud so that
> it
> >> supports tens of millions ingestions per second, OLTP queries in
> >> milliseconds, and analytics using Apache Spark/Apache Hadoop MapReduce.
> >>>>>>
> >>>>>> * * (1) If users deploy IoTDB on a device, such as a Raspberry
Pi, a
> >> wind turbine, or a meteorological station, the deployment of the chosen
> >> database is designed to be simple. A device may have hundreds of time
> >> series (but less than a thousand time series) and the database needs to
> >> handle them.
> >>>>>> * * (2) When deploying IoTDB in a data center, the computational
> >> resources (i.e., the hardware configuration of servers) is not a problem
> >> when compared to a Raspberry Pi. In this deployment, IoTDB can use more
> >> computation resources, and has the ability to handle more time seires
> >> (e.g., millions of time series).
> >>>>>>
> >>>>>> Based on these requirements, we developed IoTDB, a new data
store
> >> system for managing time series data.
> >>>>>>
> >>>>>> IoTDB started as a Tsinghua University research project. IoTDB's
> >> developer community has also grown to include additional institutions,
> for
> >> example, universities (e.g., Fudan University), research labs (e.g,
> NEL-BDS
> >> lab), and corporations (e.g., K2Data, Tencent). Funding has been
> provided
> >> by various institutions including the National Natural Science
> Foundation
> >> of China, and industry sponsors, such as Lenovo and K2Data.
> >>>>>>
> >>>>>> == Rationale ==
> >>>>>> Because there is no existed open-sourced time series databases
> >> covering all the above requirements, we developed IoTDB. As the system
> >> matures, we are seeking a long-term home for the project. We believe the
> >> Apache Software Foundation would be an ideal fit. Also joining Apache
> will
> >> help coordinate and improve the development effort of the growing
> number of
> >> organizations which contribute to IoTDB improving the diversity of our
> >> community.
> >>>>>>
> >>>>>> IoTDB contains multiple modules, which are classified into
> categories:
> >>>>>>
> >>>>>> * '''TsFile Format''': TsFile is a new columnar file format.
> >>>>>> * '''Adaptor for Analytics and Visualization''': Integrating
TsFile
> >> with Apache Hadoop HDFS, Apache Hadoop MapReduce and Apache Spark.
> Examples
> >> of integrating IoTDB with Apache Kafka, Apache Storm and Grafana are
> also
> >> provided.
> >>>>>> * '''IoTDB Engine''': An engine which consists of SQL parser,
query
> >> plan generator, memtable, authentication and authorization,write ahead
> log
> >> (WAL), crash recovery, out-of-order data handler, and index for
> aggregation
> >> and pattern matching. The engine stores system data in TsFile format.
> >>>>>> * '''IoTDB JDBC''': An implementation of Java Database Connectivity
> >> (JDBC) for clients to connect to IoTDB using Java.
> >>>>>>
> >>>>>> === TsFile Format ===
> >>>>>>
> >>>>>> TsFile format is a columnar store, which is similar with Apache
> >> Parquet and Apache CarbonData. It has the concepts of Chunk Group,
> Column
> >> Chunk, Page and Footer. Comparing with Apache Parquet and Apache
> >> CarbonData, it is designed and optimized for time series:
> >>>>>>
> >>>>>> ==== Time Series Friendly Encoding ====
> >>>>>> IoTDB currently supports run length encoding (RLE), delta-of-delta
> >> encoding, and Facebook's Gorilla encoding.
> >>>>>>
> >>>>>> Lossy encoding methods (e.g., Piecewise Linear Approximation
(PLA)
> >> and time-frequency transformation are works-in-progress.
> >>>>>>
> >>>>>>
> >>>>>> ==== Chunk Group ====
> >>>>>> The data part of a TsFile consists of many Chunk Groups. Each
Chunk
> >> Group stores the data of a device at a time interval.  A Chunk Group is
> >> similar to the row group in Apache Parquet, while there are some
> >> constraints of the time dimension:  For each device, the time intervals
> of
> >> different Chunk Groups are not overlapped and the latter Chunk Group
> always
> >> has a larger timestamp.
> >>>>>>
> >>>>>> Given a TsFile and a query with a time range filter, the query
> >> process can terminate scanning data once it reads data points whose
> >> timestamp reaches the time limit of the filter. We call the feature
> >> ''fast-return'' and it makes the time range query in a TsFile very
> >> efficient.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ==== Different Column Chunk Format (Unnecessary the Repetition
(R)
> >> and Definition (D) Fields) ====
> >>>>>>
> >>>>>> While Apache Parquet and Apache CarbonData support complex data
> >> types, e.g., nested data and sparse columns, TsFile is exclusively
> designed
> >> for time series whose data model is \<device_id, series_id, timestamp,
> >> value\>.
> >>>>>>
> >>>>>> In a `Chunk Group`, each time series is a `Column Chunk`. Even
> though
> >> these time series belong to the same device, the data points in
> different
> >> time series are not aligned in the time dimension originally.
> >>>>>>
> >>>>>> For example, if you have a device with 2 sensors on the same
data
> >> collection frequencies, sensor 1 may collect data at time 1521622662000
> >> while the other one collects data at time 1521622662001 (delta=1ms).
> >> Therefore, each Column Chunk has its timestamps and values, which is
> quite
> >> different from Apache Parquet and Apache CarbonData.  Because we store
> the
> >> time column along with each value column instead of making different
> chunks
> >> share the same time column for the sake of diverse data frequency for
> >> different time series, we do not store any null value on disk to align
> >> across time series. Besides, we do not need to attach  `repetition` (R)
> and
> >> `definition` (D) fields on each value. Therefore, the disk space is
> saved
> >> and the query latency is reduced (because we do not align data by
> >> calculating R and D fields).
> >>>>>>
> >>>>>>
> >>>>>> ==== Domain Specific Information in Each Page ====
> >>>>>> Similar to Apache Parquet and Apache CarbonData, a `Column Chunk`
> >> consists of several `Pages`, and each `Page` has a `Page header`. The
> `Page
> >> header` is a summary of the data in the page.
> >>>>>>
> >>>>>> Because TsFile is optimized for time series, the page header
> contains
> >> more domain specific information, such as the minimal and maximal value,
> >> the minimal and the maximal timestamp, the frequency and so on. TsFile
> can
> >> even store the histogram of values in the page header.
> >>>>>>
> >>>>>> This header information helps IoTDB in speeding up queries by
> >> skipping unnecessary pages.
> >>>>>>
> >>>>>>
> >>>>>> === Adaptor for Analytics ===
> >>>>>> The TsFile provides:
> >>>>>>
> >>>>>> * InputFormat/OutputFormat interfaces for Reading/Writing data.
> >>>>>> * Deep integration with Apache Spark/Hadoop MapReduce including
> >> predicate push-down, column pruning, aggregation push down, etc. So
> users
> >> can use Apache Spark SQL/HiveQL to connect and query TsFiles.
> >>>>>>
> >>>>>>
> >>>>>> === IoTDB Engine ===
> >>>>>> The IoTDB engine is a database engine, which uses TsFile as
its
> >> storage file format. The IoTDB Engine supports SQL-like query plus many
> >> useful functions:
> >>>>>>
> >>>>>> * Tree-based time series schema
> >>>>>> * Log-Structured Merge (LSM)-based storage
> >>>>>> * Overflow file for out-of-order data
> >>>>>> * Scalable index framework
> >>>>>> * Special queries for time series
> >>>>>>
> >>>>>> ==== Tree-based Time Series Schema ====
> >>>>>> IoTDB manages all the time series definitions using a tree
> structure.
> >> A path from the root of the tree to a leaf node represents a time
> series.
> >> Therefore, the unique id of a time series is a path, e.g.,
> >> `root.China.beijing.windFarm1.windTurbine1.speed`.
> >>>>>>
> >>>>>> This kind of schema can express `group by` naturally. For example,
> >> `root.China.beijing.windFarm1.*.speed` represents the speed of all the
> wind
> >> turbines in wind farm 1 in Beijing, China.
> >>>>>>
> >>>>>> ==== Log-Structured Merge (LSM)-based Storage ====
> >>>>>> In a time series, the data points should be ordered by their
> >> timestamps. In IoTDB, we use Log-Structured Merge (LSM) based mechanism.
> >> Therefore, a part of the data is stored in memory first and can be
> called
> >> as `memtable`. At this time, if data points come out-of-order, we resort
> >> them in memory. When this part of data exceeds the configured memory
> limit,
> >> we flush it on disk as a `Chunk Group` into an unclosed TsFile.
> Finally, a
> >> TsFile may contain several Chunk Groups, for reducing the number of
> small
> >> data files, which is helpful to reduce the I/O load of the storage
> system
> >> and reduces the execution time of a file-merge in LSM. Notice that the
> data
> >> is time-ordered in one Chunk Group on disk, and this layout is helpful
> for
> >> fast filtering in one Chunk Group for a query.
> >>>>>>
> >>>>>> Rule 1: In a TsFile, the Chunk Groups of one device are ordered
by
> >> timestamp (Rule 1), and it is helpful for fast filtering among Chunk
> Groups
> >> for a query.
> >>>>>>
> >>>>>> Rule 2: When the size of the unclosed TsFile reaches the threshold
> >> defined in the configuration file, we close the file and generate a new
> one
> >> to store new arriving data spanning the entire data set. Like many
> systems
> >> which use LSM-based storage, we never modify a TsFile which has been
> closed
> >> except for the file-merge process (Rule 2).
> >>>>>>
> >>>>>> Rule 3: To reduce the number of TsFiles involved in a query
process,
> >> we guarantee that the data points in different TsFiles are not
> overlapping
> >> on the time dimension after file mergence (Rule 3).
> >>>>>>
> >>>>>> ==== Overflow File for Out-of-order Data ====
> >>>>>> When a part of data is flushed on disk (and will form a `Chunk
> Group`
> >> in a TsFile), the newly arriving data points whose timestamps are
> smaller
> >> than the largest timestamp in the Tsfile are `out-of-order`.
> >>>>>>
> >>>>>> To store the out-of-order data, we organize all the troublesome
> >> `out-of-order` data point insertions into a special TsFile, named
> >> `UnSequenceTsFile`. In an UnSequenceTsFile, the Chunk Groups of one
> device
> >> may be overlapping in the time dimension, which violates the Rule 1 and
> >> costs additional time compared to a normal TsFile for query filtering.
> >>>>>>
> >>>>>> There is another special operation: updating all the data points
in
> a
> >> time range, e.g., `update all the speed values of device1 as 0 where the
> >> data time is in [1521622000000, 1521622662000]`. The operation is called
> >> when: (1) a sensor malfunctions and the database receives wrong data
> for a
> >> period; (2) we may want to reset all the records. Many NoSQL time series
> >> databases do not support such an operation. To support the operation in
> >> IoTDB, we use a tree-based structure, Treap, to store this part of
> >> operations and store them as `Overflow` files.
> >>>>>>
> >>>>>> Therefore, there are 3 kinds of data files: TsFiles,
> >> UnSequenceTsFiles and Overflow files.  TsFiles should store most of the
> >> data. The volume of UnSequenceTsFiles depends on the workload: if there
> are
> >> too many out-of-order and the time span of out-of-order is huge, the
> volume
> >> will be large. Overflow files handle fewest data operations but will
> depend
> >> on the use of the special operations.
> >>>>>>
> >>>>>> ==== LSM-tree ====
> >>>>>> Normally, LSM-based storage engines merge data files level by
level
> >> so that it looks like a tree structure. In this way, data is well
> >> organized. The disadvantage is that data will be read and written
> several
> >> times. If the tree has 4 levels, each data point will be rewritten at
> least
> >> 4 times.
> >>>>>>
> >>>>>> Currently, we do not merge all the TsFiles into one because
(1) the
> >> number of TsFiles is kept lower than many LSM storage engines because a
> >> memtable is mapped to several Chunk Groups rather than a file; (2)
> >> different TsFiles are not overlapping with each other in the time
> dimension
> >> (because of Rule 3).
> >>>>>>
> >>>>>> As mentioned before,  TsFile supports ''fast-return'' to accelerate
> >> queries. However, UnSequenceTsFile and Overflow files do not allow this
> >> feature. The time spans of UnSequenceTsFile, Overflow file andTsFile
> may be
> >> overlapped, which leads to more files involved in the query process. To
> >> accelerate these queries, there is a merging process to reorganize
> files in
> >> the background. All the three kinds of files: TsFiles, UnSequenceTsFiles
> >> and Overflow files, are involved in the merging process. The merging
> >> process is implemented using multi-threading, while each thread is
> >> responsible for a series family.
> >>>>>> After merging, only TsFiles are left. These files have
> >> non-overlapping time spans and support the ''fast-return'' feature.
> >>>>>>
> >>>>>> ==== Scalable Index Framework ====
> >>>>>> We allow users to implement indexes for faster queries. We currently
> >> support an index for pattern matching query (KV-Match index, ICDE 2019).
> >> Another index for fast aggregation (PISA index, CIKM 2016) is a
> >> work-in-progress.
> >>>>>>
> >>>>>> ==== Special Queries ====
> >>>>>> We currently support `group by time interval` aggregation queries
> and
> >> `Fill by` operations, which are similar to those of InfluxDB. Time
> series
> >> segmentation operations and frequency queries are work-in-progress.
> >>>>>>
> >>>>>> == Initial Goals ==
> >>>>>> The initial goals are to be open sourced and to integrate with
the
> >> Apache development process. Furthermore, we plan for incremental
> >> development, and releases along with the Apache guidelines.
> >>>>>>
> >>>>>> == Current Status ==
> >>>>>> We have developed the system for more than 2 years. There are
> >> currently 13k lines of code, some of which are generated by Antlr3 and
> >> Thrift.  There are 230 issues which have been solved and more than 1500
> >> commits.
> >>>>>>
> >>>>>> The system has been deployed in the staging environment of the
State
> >> Grid Corporation of China to handle ~3 million time series (i.e, ~30,000
> >> power generation assembly * ~100 sensors) and an equipment service
> company
> >> in China managing ~2 million time series (i.e, ~20k devices * 100
> sensors).
> >> The insertion speed reaches ~2 million points/second/node, which is
> faster
> >> than InfluxDB, OpenTSDB and Apache Cassandra in our environment.
> >>>>>>
> >>>>>> There are many new features in the works including those mentioned
> >> herein. We will add more analytics functions, improve the data file
> merge
> >> process, and finish the first released version of IoTDB.
> >>>>>>
> >>>>>> == Meritocracy ==
> >>>>>> The IoTDB project operates on meritocratic principles. Developers
> who
> >> submit more code with higher quality earn more merit. We have used
> `Issues`
> >> and `Pull Requests` modules on Github for collecting users' suggestions
> and
> >> patches. Users who submit issues, pull requests, documents and help the
> >> community management are welcomed and encouraged to become committers.
> >>>>>>
> >>>>>> == Community ==
> >>>>>>
> >>>>>> The IoTDB project users communicate on Github (
> >>>>>> https://github.com/thulab/tsfile) . Developers make the
> >> communication on a website which is similar with JIRA (Currently, only
> >> registered users can apply to access the project for communication, url:
> >> https://tower.im/projects/36de8571a0ff4833ae9d7f1c5c400c22/
> >>>>>> ). We have also introduced IoTDB at many technical conferences.
> Next,
> >> we will build the mailing list for more convenience, broader
> communication
> >> and archived discussions.
> >>>>>>
> >>>>>> If IoTDB is accepted for incubation at the Apache Software
> >> Foundation, the primary goal is to build a larger community. We believe
> >> that IoTDB will become a key project for time series data management,
> and
> >> so, we will rely on a large community of users and developers.
> >>>>>>
> >>>>>> TODO: IoTDB is currently on a private Github repository (
> >>>>>> https://github.com/thulab/iotdb), while its subproject TsFile
(a
> >> file format for storing time series data) is open sourced on Github (
> >> https://github.com/thulab/tsfile
> >>>>>> ).
> >>>>>>
> >>>>>> == Core Developers ==
> >>>>>> IoTDB was initially developed by 2 dozen of students and teachers
at
> >> Tsinghua University. Now, more and more developers have joined coming
> from
> >> other universities: Fudan University, Northwestern Polytechnical
> University
> >> and Harbin Institute of Technology in China.  Other developers come from
> >> business companies such as Lenovo and Microsoft. We will be working to
> >> bring more and more developers into the project making contributions to
> >> IoTDB.
> >>>>>>
> >>>>>> == Relationships with Other Apache Products ==
> >>>>>> IoTDB requires some Apache products (Apache Thrift, commons,
> >> collections, httpclient).
> >>>>>>
> >>>>>> IoTDB-Spark-connector and IoTDB-Hadoop-connector have been developed
> >> for supporting analysing time series data by using Apache Spark and
> >> MapReduce.
> >>>>>>
> >>>>>> Overall, IoTDB is designed as an open architecture, and it can
be
> >> integrated with many other systems in the future.
> >>>>>>
> >>>>>> As mentioned before, in the IoTDB project, we designed a new
> columnar
> >> file format, called TsFile, which is similar to Apache Parquet. However,
> >> the new file format is optimized for time series data.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> == Known Risks ==
> >>>>>>
> >>>>>> === Orphaned Products ===
> >>>>>> Given the current level of investment in IoTDB, the risk of
the
> >> project being abandoned is minimal. Time series data is more and more
> >> important and there are several constituents who are highly inspired to
> >> continue development. Tsinghua and NEL-BDS Lab relies on IoTDB as a
> >> platform for a large number of long-term research projects. We have
> >> deployed IoTDB in some company's staging environments for future
> >> applications.
> >>>>>>
> >>>>>> === Inexperience with Open Source ===
> >>>>>> Students and researchers in Tsinghua University have been developing
> >> and using open source software for a long time. It is wonderful to be
> >> guided to join a formal open-source process for students. Some of our
> >> committers
> >>>>>> have  experiences contributing to open source, for example:
> >>>>>>
> >>>>>> * druid:
> >>>>>>
> >>
> https://github.com/druid-io/druid/commit/f18cc5df97e5826c2dd8ffafba9fcb69d10a4d44
> >>>>>>
> >>>>>> * druid:
> >>>>>>
> >>
> https://github.com/druid-io/druid/commit/aa7aee53ce524b7887b218333166941654788794
> >>>>>>
> >>>>>> * YCSB:
> >>>>>> https://github.com/brianfrankcooper/YCSB/pull/776
> >>>>>>
> >>>>>>
> >>>>>> Additionally, several ASF veterans and industry veterans have
agreed
> >> to mentor the project and are listed in this proposal. The project will
> >> rely on their guidance and collective wisdom to quickly transition the
> >> entire team of initial committers towards practicing the Apache Way.
> >>>>>>
> >>>>>>
> >>>>>> === Reliance on Salaried Developers ===
> >>>>>> Most of current developers are students and researchers/professors
> in
> >> universities, and their researches focus on big data management and
> >> analytics. It is unlikely that they will change their research focus
> away
> >> from big data management.  We will work to ensure that the ability for
> the
> >> project to continuously be stewarded and to proceed forward independent
> of
> >> salaried developers is continued.
> >>>>>>
> >>>>>> === An Excessive Fascination with the Apache Brand ===
> >>>>>> Most of the initial developers come from Tsinghua University
with no
> >> intent to use the Apache brand for profit. We have no plans for making
> use
> >> of Apache brand in press releases nor posting billboards advertising
> >> acceptance of IoTDB into Apache Incubator.
> >>>>>>
> >>>>>>
> >>>>>> == Initial Source ==
> >>>>>> IoTDB's github address and some required dependencies:
> >>>>>>
> >>>>>> * The storage file format:
> >>>>>> https://github.com/thulab/tsfile
> >>>>>>
> >>>>>> * Adaptor for Apache Hadoop MapReduce:
> >>>>>> https://github.com/thulab/tsfile-hadoop-connector
> >>>>>>
> >>>>>> * Adaptor for Apache Spark:
> >>>>>> https://github.com/thulab/tsfile-spark-connector
> >>>>>>
> >>>>>> * Adaptor for Grafana:
> >>>>>> https://github.com/thulab/iotdb-grafana
> >>>>>>
> >>>>>> * The database engine:
> >>>>>> https://github.com/thulab/iotdb
> >>>>>> (private project up to now)
> >>>>>> * The client driver:
> >>>>>> https://github.com/thulab/iotdb-jdbc
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> === External Dependencies ===
> >>>>>> To the best of our knowledge, all dependencies of IoTDB are
> >> distributed under Apache compatible licenses. Upon acceptance to the
> >> incubator, we would begin a thorough analysis of all transitive
> >> dependencies to verify this fact and introduce license checking into the
> >> build and release process.
> >>>>>>
> >>>>>> == Documentation ==
> >>>>>> * Documentation for TsFile:
> >>>>>> https://github.com/thulab/tsfile/wiki
> >>>>>>
> >>>>>> * Documentation for IoTDB and its JDBC:
> >>>>>> http://tsfile.org/document
> >>>>>> (Chinese only. An English version is in progress.)
> >>>>>>
> >>>>>> == Required Resources ==
> >>>>>> === Mailing Lists ===
> >>>>>> *
> >>>>>> private@iotdb.incubator.apache.org
> >>>>>>
> >>>>>> *
> >>>>>> dev@iotdb.incubator.apache.org
> >>>>>>
> >>>>>> *
> >>>>>> commits@iotdb.incubator.apache.org
> >>>>>>
> >>>>>>
> >>>>>> === Git Repositories ===
> >>>>>> *
> >>>>>> https://git-wip-us.apache.org/repos/asf/incubator-iotdb.git
> >>>>>>
> >>>>>>
> >>>>>> === Issue Tracking ===
> >>>>>> *  JIRA IoTDB (We currently use the issue management provided
by
> >> Github to track issues.)
> >>>>>>
> >>>>>>
> >>>>>> == Initial Committers ==
> >>>>>> Tsinghua University, K2Data Company, Lenovo, Microsoft
> >>>>>>
> >>>>>> Jianmin Wang (jimwang at tsinghua dot edu dot cn )
> >>>>>>
> >>>>>> Xiangdong Huang (sainthxd at gmail dot com)
> >>>>>>
> >>>>>> Jun Yuan (richard_yuan16 at 163 dot com)
> >>>>>>
> >>>>>> Chen Wang ( wang_chen at tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Jialin Qiao (qjl16 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Jinrui Zhang (jinrzhan at microsoft dot com)
> >>>>>>
> >>>>>> Rong Kang (kr11 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Tian Jiang(jiangtia18 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Shuo Zhang (zhangshuo at k2data dot com dot cn)
> >>>>>>
> >>>>>> Lei Rui (rl18 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Rui Liu (liur17 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Kun Liu (liukun16 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Gaofei Cao (cgf16 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Xinyi Zhao (xyzhao16 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Dongfang Mao (maodf17 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Tianan Li(lta18 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Yue Su (suy18 at mails dot tsinghua dot edu dot cn)
> >>>>>>
> >>>>>> Hui Dai (daihui_iot at lenovo dot com, yuct_iot at lenovo dot
com )
> >>>>>>
> >>>>>> == Sponsors ==
> >>>>>> === Champion ===
> >>>>>> Kevin A. McGrail (
> >>>>>> kmcgrail@apache.org
> >>>>>> )
> >>>>>>
> >>>>>> === Nominated Mentors ===
> >>>>>> Justin Mclean (justin at classsoftware dot com)
> >>>>>>
> >>>>>> Christofer Dutz (christofer.dutz at c-ware dot de)
> >>>>>>
> >>>>>> Willem Jiang (willem.jiang at gmail dot com)
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message