文库

文库
字符
转换
加密
网络
更多

图表

数学

坐标

图片

文件
文库

字符

转换

加密

网络

更多

图表

数学

坐标

图片

文件

在线工具大全

所有

中文

英语

最新

热度

4857 条查询结果

Generic Log-based Incremental Checkpoint --- Performance Evaluation & Analytics

Generic Log-based Incremental Checkpoint (GIC for short in this article) has become a production-ready feature since Flink 1.16 release. We previously discussed the fundamental concept and underlying mechanism of GIC in our blog post titled "Generic Log-based Incremental Checkpoints I" [1]. In this blog post, we aim to provide a comprehensive analysis of GIC’s advantages and disadvantages by conducting thorough experiments and analysis.

flink

52 技术 lddgo 分享于 2024-01-23

How-to guide: Synchronize MySQL sub-database and sub-table using Flink CDC

This tutorial will show you how to use Flink CDC to build a real-time data lake for the above-presented scenario. The examples in this article will all be based on Docker with the use of Flink SQL. There is no need for a line of Java/Scala code or installation of an IDE. The entire content of this guide contains the docker-compose file.

flink

56 技术 lddgo 分享于 2024-01-23

Flink SQL Secrets: Mastering the Art of Changelog Event Out-of-Orderness

Alice is a data engineer taking care of real-time data processing in her company. She found that Flink SQL sometimes can produce update (with regard to keys) events. But, with the early versions of Flink, those events can not be written to Kafka directly because Kafka is an append-only messaging system essentially. Fortunately, the Flink community released the connector upsert-kafka in a later version that supports writing update events. Later, she found that the Flink SQL jobs

flink

53 技术 lddgo 分享于 2024-01-23

Flink's Test Harnesses Uncovered

When working with Apache Flink, developers often face challenges while testing user-defined functions (UDFs) that utilize state and timers. In this article we will answer a question "How to test user-defined functions (UDFs) using Flink's test harnesses".

flink

62 技术 lddgo 分享于 2024-01-23

Joining Highly Skewed Streams in Flink SQL

Flink SQL is a powerful tool which unifies batch and stream processing. It provides low-code data analytics while complying with the SQL standard. In production systems, our customers found that as the workload scales, the SQL jobs that used to work well may slow down significantly, or even fail. And data skews is a common and important reason. Data skew refers to the asymmetry of the probability distribution of a variable about its mean. In other words

flink

55 技术 lddgo 分享于 2024-01-23

Performance Analysis and Tuning Guides for Hybrid Shuffle Mode

The Apache Flink community introduced the Hybrid Shuffle Mode[1] in 1.16, which combines traditional Batch Shuffle with Pipelined Shuffle from stream processing to give Flink batch processing more powerful capabilities. The core idea of Hybrid Shuffle is to break scheduling constraints and decide whether downstream tasks need to be scheduled based on the availability of resources, while supporting in-memory data exchange without spilling to disk when conditions permit.

flink

52 技术 lddgo 分享于 2024-01-23

Stream Processing Scalability: Challenges and Solutions

Stream processing is a programming paradigm which views data streams, or sequences of events in time, as the central input and output objects of computation. This enables organizations to harness the value of data immediately, making it a valuable tool for time-sensitive applications and scenarios requiring up-to-the-minute insights. Stream processing systems excel at handling high-velocity, unbounded data streams, such as click streams, log streams, live sensor data, social media feeds

flink

52 技术 lddgo 分享于 2024-01-23

All You Need to Know About PyFlink

PyFlink serves as a Python API for Apache Flink, providing users with a medium to develop Flink programs in Python and deploy them on a Flink cluster. In this post, we will introduce PyFlink from the following aspects: The structure of a fundamental PyFlink job and some basic knowledge surrounding it The operational mechanisms of PyFlink jobs, the high-level architecture, and its internal workings Essential performance optimization strategies for PyFlink Future projections for PyFlink

flink

57 技术 lddgo 分享于 2024-01-23

Stream Enrichment in Flink

Imagine a photo without its vibrant colors; intriguing but lacking depth. Stream enrichment works similarly for data. It infuses raw data streams with added context, transforming them from grayscale to full color. Going beyond the simple transmission of information, stream enrichment breathes life into data, augmenting it with additional context and details. By embedding supplementary data into an existing data stream, businesses and organizations can paint a clearer picture, driving enhanced

flink

198 技术 lddgo 分享于 2024-01-23

Batch Processing vs Stream Processing

Batch processingand stream processing are two very different models for processing data. Both have their strengths but suit different use cases. In this post we cover the differences, provide examples of use cases, and look at the ways the two models can work together.

flink

56 技术 lddgo 分享于 2024-01-23

简体中文