排序
How to be Test Driven with Spark: Chapter 3 – First Spark test
How to be Test Driven with Spark: Chapter 3 - First Spark test,This goal of this tutorial is to provide a way to easily be test driven with spark on your local setup without using ...
Automatizando a Qualidade de Dados com DQX: Performance e praticidade
Automatizando a Qualidade de Dados com DQX: Performance e praticidade, Introdução ao DQX No cenário atual, onde os dados são frequentemente comparados ao 'novo petróleo', gara...
How to be Test Driven with Spark: Chapter 0 and 1 – Modern Python Setup
How to be Test Driven with Spark: Chapter 0 and 1 - Modern Python Setup, Chapter 0: Why this tutorial This goal of this tutorial is to provide a way to easily be test driven with s...
Run PySpark Local Python Windows Notebook
Run PySpark Local Python Windows Notebook, Introduction PySpark is the Python API for Apache Spark, an open-source distributed computing system that enables fast, scalable data pro...
Why Is Spark Slow??
Why Is Spark Slow??, Why Is Spark Slow?? Starting with an eye-catching title, 'Why is Spark slow??,' it's important to note that calling Spark 'slow' can mean various things. Is it...
Entendendo e aplicando estratégias de tunning Apache Spark
Entendendo e aplicando estratégias de tunning Apache Spark, Motivadores para ler esse artigo. Experiência própria e vivenciada em momentos de caos e momentos de analise tranquil...
Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka
Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka,Atualmente, vivemos em um mundo onde peta bytes de dados são gerados a cada segund...
Leveraging PySpark.Pandas for Efficient Data Pipelines
Leveraging PySpark.Pandas for Efficient Data Pipelines,In the world of big data, Spark has become a pivotal tool for handling and processing large datasets efficiently. However, if...
Learning Spark 2.0 Knowledge Dump
Learning Spark 2.0 Knowledge Dump,This post will serve as a continuous knowledge dump regarding the 'Learning Spark 2.0' book, where I'll dump certain quotes that I find relevant (...
Spark functions
Spark functions,We learned about Spark dataframes in the Data Engineering Zoomcamp and how to write Spark functions. This is one of the advantages of Spark. Spark can write SQL com...
Embarking on the Data Odyssey: A Deep Dive into Data Engineering for Tech Enthusiasts
Embarking on the Data Odyssey: A Deep Dive into Data Engineering for Tech Enthusiasts,In the ever-expanding digital landscape, the role of data engineering stands out as the unsung...
A new Kedro dataset for Spark Structured Streaming
A new Kedro dataset for Spark Structured Streaming,This article guides data practitioners on how to set up a Kedro project to use the new SparkStreaming Kedro dataset, with example...