Learn Streaming with Flink (2 Part Series)
1 Mastering Stream Processing: Explore Apache Flink’s Wonders in Real-Time Data Magic!
2 Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data
Hey, fellow programmers! After delving into the world of streaming and getting a glimpse of Flink in our previous articles, your curiosity has likely piqued, and you’re itching to dive into Flink coding. Well, buckle up because we’re about to embark on our first Flink coding adventure: a basic word-counting program. Fire up your favorite IDE and Docker to effortlessly set up and plunge into the realm of streaming with just a few keystrokes.
Word Count Streaming Program:
In this program, we’ll tally up repeated words in a sentence. For instance, in the sentence “Hello, I am Akshit, and I will perform streaming,” the word “I” is repeated, and our program will highlight that.
Prerequisites:
- Java 17
- Flink 1.18
- IntelliJ
- Socket terminal (For Mac, the command is: nc -lk 9999)
To start, create a Maven Java project with Flink dependencies installed:
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.18.0
Enter fullscreen mode Exit fullscreen mode
We’ll use Flink’s DataStream API, and the following line marks the beginning of our streaming journey:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Enter fullscreen mode Exit fullscreen mode
Read data from a socket like this:
DataStream<String> text = env.socketTextStream("localhost", 9999);
Enter fullscreen mode Exit fullscreen mode
Now, let’s utilize the powerful flatMap function in Flink to transform our data streams. This function allows us to unnest elements, filter elements based on custom logic, and modify elements individually.
Create a custom flatMap function to split the words:
public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
String[] tokens = value.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}
Enter fullscreen mode Exit fullscreen mode
Now, let’s sum the repeated words and create a DataStream object:
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new Tokenizer())
.keyBy(0)
.sum(1);
Enter fullscreen mode Exit fullscreen mode
Finally, print/sink the output in the terminal:
counts.print();
Enter fullscreen mode Exit fullscreen mode
Execute the Flink streaming job with the following line:
env.execute("Word count streaming");
Enter fullscreen mode Exit fullscreen mode
Find the complete program on GitHub. Explore the results of the output and witness the streaming magic unfold.
Input of the data:
Output from the streaming engine:
Conclusion:
In this hands-on session, we implemented the word count program to grasp the basics of streaming with Flink. The journey doesn’t end here; stay tuned as we delve into more advanced Flink concepts in the upcoming articles. Keep up with me, show some love to the blog, and let the data streaming continue!
Learn Streaming with Flink (2 Part Series)
1 Mastering Stream Processing: Explore Apache Flink’s Wonders in Real-Time Data Magic!
2 Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data
原文链接:Mastering Flink Streaming: A Practical Guide to Word Counting for Beginners in Big Data
暂无评论内容