• 文库
  • 字符
  • 转换
  • 加密
  • 网络
  • 更多
    图表
    数学
    坐标
    图片
    文件
  • 文库
    字符
    转换
    加密
    网络
    更多
    图表
    数学
    坐标
    图片
    文件
logo 在线工具大全
所有 中文 英语 最新 热度
4857 条查询结果

Generic Log-based Incremental Checkpoint (GIC for short in this article) has become a production-ready feature since Flink 1.16 release. We previously discussed the fundamental concept and underlying mechanism of GIC in our blog post titled "Generic Log-based Incremental Checkpoints I" [1]. In this blog post, we aim to provide a comprehensive analysis of GIC’s advantages and disadvantages by conducting thorough experiments and analysis.

52 技术 lddgo 分享于 2024-01-23

This tutorial will show you how to use Flink CDC to build a real-time data lake for the above-presented scenario. The examples in this article will all be based on Docker with the use of Flink SQL. There is no need for a line of Java/Scala code or installation of an IDE. The entire content of this guide contains the docker-compose file.

56 技术 lddgo 分享于 2024-01-23

Alice is a data engineer taking care of real-time data processing in her company. She found that Flink SQL sometimes can produce update (with regard to keys) events. But, with the early versions of Flink, those events can not be written to Kafka directly because Kafka is an append-only messaging system essentially. Fortunately, the Flink community released the connector upsert-kafka in a later version that supports writing update events. Later, she found that the Flink SQL jobs

53 技术 lddgo 分享于 2024-01-23

When working with Apache Flink, developers often face challenges while testing user-defined functions (UDFs) that utilize state and timers. In this article we will answer a question "How to test user-defined functions (UDFs) using Flink's test harnesses".

62 技术 lddgo 分享于 2024-01-23

Flink SQL is a powerful tool which unifies batch and stream processing. It provides low-code data analytics while complying with the SQL standard. In production systems, our customers found that as the workload scales, the SQL jobs that used to work well may slow down significantly, or even fail. And data skews is a common and important reason. Data skew refers to the asymmetry of the probability distribution of a variable about its mean. In other words

55 技术 lddgo 分享于 2024-01-23

The Apache Flink community introduced the Hybrid Shuffle Mode[1] in 1.16, which combines traditional Batch Shuffle with Pipelined Shuffle from stream processing to give Flink batch processing more powerful capabilities. The core idea of Hybrid Shuffle is to break scheduling constraints and decide whether downstream tasks need to be scheduled based on the availability of resources, while supporting in-memory data exchange without spilling to disk when conditions permit.

52 技术 lddgo 分享于 2024-01-23

Stream processing is a programming paradigm which views data streams, or sequences of events in time, as the central input and output objects of computation. This enables organizations to harness the value of data immediately, making it a valuable tool for time-sensitive applications and scenarios requiring up-to-the-minute insights. Stream processing systems excel at handling high-velocity, unbounded data streams, such as click streams, log streams, live sensor data, social media feeds

52 技术 lddgo 分享于 2024-01-23

PyFlink serves as a Python API for Apache Flink, providing users with a medium to develop Flink programs in Python and deploy them on a Flink cluster. In this post, we will introduce PyFlink from the following aspects: The structure of a fundamental PyFlink job and some basic knowledge surrounding it The operational mechanisms of PyFlink jobs, the high-level architecture, and its internal workings Essential performance optimization strategies for PyFlink Future projections for PyFlink

57 技术 lddgo 分享于 2024-01-23

Imagine a photo without its vibrant colors; intriguing but lacking depth. Stream enrichment works similarly for data. It infuses raw data streams with added context, transforming them from grayscale to full color. Going beyond the simple transmission of information, stream enrichment breathes life into data, augmenting it with additional context and details. By embedding supplementary data into an existing data stream, businesses and organizations can paint a clearer picture, driving enhanced

198 技术 lddgo 分享于 2024-01-23

Batch processingand stream processing are two very different models for processing data. Both have their strengths but suit different use cases. In this post we cover the differences, provide examples of use cases, and look at the ways the two models can work together.

56 技术 lddgo 分享于 2024-01-23