SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of many companies.
SeaTunnel focuses on data integration and data synchronization, and is mainly designed to solve common problems in the field of data integration:
Besides, SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run on many different engines, such as SeaTunnel Zeta Engine, Flink, Spark that are currently supported.
The runtime process of SeaTunnel is shown in the figure above.
The user configures the job information and selects the execution engine to submit the job.
The Source Connector is responsible for parallelizing the data and sending the data to the downstream Transform or directly to the Sink, and the Sink writes the data to the destination. It is worth noting that both Source and Transform and Sink can be easily developed and extended by yourself.
The default engine use by SeaTunnel is SeaTunnel Engine. If you choose to use the Flink or Spark engine, SeaTunnel will package the Connector into a Flink or Spark program and submit it to Flink or Spark to run.
Source Connectors supported check out
Sink Connectors supported check out
Transform supported check out
Download address for run-directly software package : https://seatunnel.apache.org/download
SeaTunnel uses SeaTunnel Zeta Engine as the runtime execution engine for data synchronization by default. We highly recommend utilizing Zeta engine as the runtime engine, as it offers superior functionality and performance. By the way, SeaTunnel also supports the use of Flink or Spark as the execution engine.
SeaTunnel Zeta Engine https://seatunnel.apache.org/docs/start-v2/locally/quick-start-seatunnel-engine/
Spark https://seatunnel.apache.org/docs/start-v2/locally/quick-start-spark
Flink https://seatunnel.apache.org/docs/start-v2/locally/quick-start-flink
Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task monitoring for hundreds of real-time streaming computing tasks.
Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through SeaTunnel, and then store into Clickhouse.
Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and maintenance for Sina News, CDN and other services, and write it into Clickhouse.
Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.
SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a new retail brand of Yonghui Yunchuang Technology.
For more use cases, please refer to: https://seatunnel.apache.org/blog
This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please follow the REPORTING GUIDELINES to report unacceptable behavior.
Thanks to all developers!
Please follow this document.
dev-subscribe@seatunnel.apache.org
, follow the reply to subscribe
the mail list.
SeaTunnel enriches the CNCF CLOUD NATIVE Landscape.
Various companies and organizations use SeaTunnel for research, production and commercial products. Visit our website to find the user page.