Time is a critical element in stream processing since data is processed as it arrives and must be processed quickly to avoid delays. The ubiquity of time in stream processing means that data processing must be designed to take into account the time factor. Time-based windowing is a common technique used in stream processing to ensure that data is processed in a timely manner.
In the previous article, we covered some aspects of time windows and time attributes that you should consider when planning your data collection strategy. This article will provide a more in-depth look at how to create a time window.
Flink SQL has emerged as the de facto standard for low-code data analytics. It has managed to unify batch and stream processing while simultaneously staying true to the SQL standard. In addition, it provides a rich set of advanced features for real-time use cases. In a nutshell, Flink SQL is the best of both worlds: it gives you the ability to process streaming data using SQL, but it also supports batch processing.
Testing your Apache Flink SQL code is a critical step in ensuring that your application is running smoothly and provides the expected results. Flink SQL applications are used for a wide range of data processing tasks, from complex analytics to simple SQL jobs. A comprehensive testing process can help identify potential issues early in the development process and ensure that your application works as expected. This post will go through several testing possibilities for your Flink SQL
This blog post will guide you through the Kafka connectors that are available in the Flink Table API. By the end of this blog post, you will have a better understanding of which connector is more suitable for a specific application. Flink DataStream API provides Kafka connector, which works in append mode and can be used by your Flink program written in the Scala/Java API. Besides that, Flink has the Table API which offers two Kafka connectors:
Generic Log-based Incremental Checkpoint (GIC for short in this article) has become a production-ready feature since Flink 1.16 release. We previously discussed the fundamental concept and underlying mechanism of GIC in our blog post titled "Generic Log-based Incremental Checkpoints I" [1]. In this blog post, we aim to provide a comprehensive analysis of GIC’s advantages and disadvantages by conducting thorough experiments and analysis.
This tutorial will show you how to use Flink CDC to build a real-time data lake for the above-presented scenario. The examples in this article will all be based on Docker with the use of Flink SQL. There is no need for a line of Java/Scala code or installation of an IDE. The entire content of this guide contains the docker-compose file.
Alice is a data engineer taking care of real-time data processing in her company. She found that Flink SQL sometimes can produce update (with regard to keys) events. But, with the early versions of Flink, those events can not be written to Kafka directly because Kafka is an append-only messaging system essentially. Fortunately, the Flink community released the connector upsert-kafka in a later version that supports writing update events. Later, she found that the Flink SQL jobs
When working with Apache Flink, developers often face challenges while testing user-defined functions (UDFs) that utilize state and timers. In this article we will answer a question "How to test user-defined functions (UDFs) using Flink's test harnesses".
Flink SQL is a powerful tool which unifies batch and stream processing. It provides low-code data analytics while complying with the SQL standard. In production systems, our customers found that as the workload scales, the SQL jobs that used to work well may slow down significantly, or even fail. And data skews is a common and important reason. Data skew refers to the asymmetry of the probability distribution of a variable about its mean. In other words