Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data - 拾光赋-拾光赋

Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data

1年前发布

0268

Learn Streaming with Flink (2 Part Series)

1 Mastering Stream Processing: Explore Apache Flink’s Wonders in Real-Time Data Magic!
2 Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data

Hey, fellow programmers! After delving into the world of streaming and getting a glimpse of Flink in our previous articles, your curiosity has likely piqued, and you’re itching to dive into Flink coding. Well, buckle up because we’re about to embark on our first Flink coding adventure: a basic word-counting program. Fire up your favorite IDE and Docker to effortlessly set up and plunge into the realm of streaming with just a few keystrokes.

Word Count Streaming Program:

In this program, we’ll tally up repeated words in a sentence. For instance, in the sentence “Hello, I am Akshit, and I will perform streaming,” the word “I” is repeated, and our program will highlight that.

Prerequisites:

Java 17
Flink 1.18
IntelliJ
Socket terminal (For Mac, the command is: nc -lk 9999)

To start, create a Maven Java project with Flink dependencies installed:

mvn archetype:generate \
  -DarchetypeGroupId=org.apache.flink \
  -DarchetypeArtifactId=flink-quickstart-java \
  -DarchetypeVersion=1.18.0

Enter fullscreen mode Exit fullscreen mode

We’ll use Flink’s DataStream API, and the following line marks the beginning of our streaming journey:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Enter fullscreen mode Exit fullscreen mode

Read data from a socket like this:

DataStream<String> text = env.socketTextStream("localhost", 9999);

Enter fullscreen mode Exit fullscreen mode

Now, let’s utilize the powerful flatMap function in Flink to transform our data streams. This function allows us to unnest elements, filter elements based on custom logic, and modify elements individually.

Create a custom flatMap function to split the words:

public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {

    @Override
    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
        String[] tokens = value.toLowerCase().split("\\W+");
        for (String token : tokens) {
            if (token.length() > 0) {
                out.collect(new Tuple2<>(token, 1));
            }
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

Now, let’s sum the repeated words and create a DataStream object:

DataStream<Tuple2<String, Integer>> counts = text
    .flatMap(new Tokenizer())
    .keyBy(0)
    .sum(1);

Enter fullscreen mode Exit fullscreen mode

Finally, print/sink the output in the terminal:

counts.print();

Enter fullscreen mode Exit fullscreen mode

Execute the Flink streaming job with the following line:

env.execute("Word count streaming");

Enter fullscreen mode Exit fullscreen mode

Find the complete program on GitHub. Explore the results of the output and witness the streaming magic unfold.

Input of the data:

Output from the streaming engine:

Conclusion:

In this hands-on session, we implemented the word count program to grasp the basics of streaming with Flink. The journey doesn’t end here; stay tuned as we delve into more advanced Flink concepts in the upcoming articles. Keep up with me, show some love to the blog, and let the data streaming continue!

Learn Streaming with Flink (2 Part Series)

1 Mastering Stream Processing: Explore Apache Flink’s Wonders in Real-Time Data Magic!
2 Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data

原文链接：Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data

© 版权声明

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END

Java（EN）
# java # bigdata # streaming # apacheflink

喜欢就支持一下吧

相关推荐

评论抢沙发

请登录后发表评论

暂无评论内容