• 文库
  • 字符
  • 转换
  • 加密
  • 网络
  • 更多
    图表
    数学
    坐标
    图片
    文件
  • 文库
    字符
    转换
    加密
    网络
    更多
    图表
    数学
    坐标
    图片
    文件
logo 在线工具大全
所有 中文 英语 最新 热度
47 条查询结果

In the previous article, we covered some aspects of time windows and time attributes that you should consider when planning your data collection strategy. This article will provide a more in-depth look at how to create a time window.

47 技术 lddgo 分享于 2024-01-23

Flink SQL has emerged as the de facto standard for low-code data analytics. It has managed to unify batch and stream processing while simultaneously staying true to the SQL standard. In addition, it provides a rich set of advanced features for real-time use cases. In a nutshell, Flink SQL is the best of both worlds: it gives you the ability to process streaming data using SQL, but it also supports batch processing.

53 技术 lddgo 分享于 2024-01-23

Testing your Apache Flink SQL code is a critical step in ensuring that your application is running smoothly and provides the expected results. Flink SQL applications are used for a wide range of data processing tasks, from complex analytics to simple SQL jobs. A comprehensive testing process can help identify potential issues early in the development process and ensure that your application works as expected. This post will go through several testing possibilities for your Flink SQL

47 技术 lddgo 分享于 2024-01-23

This blog post will guide you through the Kafka connectors that are available in the Flink Table API. By the end of this blog post, you will have a better understanding of which connector is more suitable for a specific application. Flink DataStream API provides Kafka connector, which works in append mode and can be used by your Flink program written in the Scala/Java API. Besides that, Flink has the Table API which offers two Kafka connectors:

50 技术 lddgo 分享于 2024-01-23

Generic Log-based Incremental Checkpoint (GIC for short in this article) has become a production-ready feature since Flink 1.16 release. We previously discussed the fundamental concept and underlying mechanism of GIC in our blog post titled "Generic Log-based Incremental Checkpoints I" [1]. In this blog post, we aim to provide a comprehensive analysis of GIC’s advantages and disadvantages by conducting thorough experiments and analysis.

46 技术 lddgo 分享于 2024-01-23

This tutorial will show you how to use Flink CDC to build a real-time data lake for the above-presented scenario. The examples in this article will all be based on Docker with the use of Flink SQL. There is no need for a line of Java/Scala code or installation of an IDE. The entire content of this guide contains the docker-compose file.

52 技术 lddgo 分享于 2024-01-23

Alice is a data engineer taking care of real-time data processing in her company. She found that Flink SQL sometimes can produce update (with regard to keys) events. But, with the early versions of Flink, those events can not be written to Kafka directly because Kafka is an append-only messaging system essentially. Fortunately, the Flink community released the connector upsert-kafka in a later version that supports writing update events. Later, she found that the Flink SQL jobs

47 技术 lddgo 分享于 2024-01-23

When working with Apache Flink, developers often face challenges while testing user-defined functions (UDFs) that utilize state and timers. In this article we will answer a question "How to test user-defined functions (UDFs) using Flink's test harnesses".

54 技术 lddgo 分享于 2024-01-23

Flink SQL is a powerful tool which unifies batch and stream processing. It provides low-code data analytics while complying with the SQL standard. In production systems, our customers found that as the workload scales, the SQL jobs that used to work well may slow down significantly, or even fail. And data skews is a common and important reason. Data skew refers to the asymmetry of the probability distribution of a variable about its mean. In other words

50 技术 lddgo 分享于 2024-01-23

The Apache Flink community introduced the Hybrid Shuffle Mode[1] in 1.16, which combines traditional Batch Shuffle with Pipelined Shuffle from stream processing to give Flink batch processing more powerful capabilities. The core idea of Hybrid Shuffle is to break scheduling constraints and decide whether downstream tasks need to be scheduled based on the availability of resources, while supporting in-memory data exchange without spilling to disk when conditions permit.

48 技术 lddgo 分享于 2024-01-23