dataengineering共16篇
Working with Parquet files in Java using Carpet-拾光赋

Working with Parquet files in Java using Carpet

Working with Parquet files in Java using Carpet,After some time working with Parquet files in Java using the Parquet Avro library, and studying how it worked, I concluded that desp...
kity的头像-拾光赋kity7个月前
03915
Learning Spark 2.0 Knowledge Dump-拾光赋

Learning Spark 2.0 Knowledge Dump

Learning Spark 2.0 Knowledge Dump,This post will serve as a continuous knowledge dump regarding the 'Learning Spark 2.0' book, where I'll dump certain quotes that I find relevant (...
kity的头像-拾光赋kity9个月前
04513
Top 10 Common Data Engineers and Scientists Pain Points in 2024-拾光赋

Top 10 Common Data Engineers and Scientists Pain Points in 2024

Top 10 Common Data Engineers and Scientists Pain Points in 2024,As we navigate through 2024, the landscape of data engineering and science continues to evolve at a breakneck pace. ...
kity的头像-拾光赋kity9个月前
0217
PySpark & Apache Spark - Overview-拾光赋

PySpark & Apache Spark – Overview

PySpark & Apache Spark - Overview,PySpark is Python API for Apache Spark. It enables us to perform real-time large-scale data processing in a distributed environment using python. ...
kity的头像-拾光赋kity11个月前
03210
Working with Parquet files in Java-拾光赋

Working with Parquet files in Java

Working with Parquet files in Java,Parquet is a widely used format in the Data Engineering realm and holds significant potential for traditional Backend applications. This article ...
kity的头像-拾光赋kity2年前
0486
From Class to Abstract Classes-拾光赋

From Class to Abstract Classes

From Class to Abstract Classes, From Bootstrap to Airflow DAG (11 Part Series) 1 Web Scraping Sprott U Fund with BS4 in 10 Lines of Code 2 The Web Scraping Continuum ... 7 more par...
kity的头像-拾光赋kity2年前
03114
How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem-拾光赋

How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem

How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem, In the field of Data Engineering, the Apache Spark framework is one of the most known and powerful ways to extrac...
kity的头像-拾光赋kity2年前
04115
Integrando uma Web API com Datastore Emulator-拾光赋

Integrando uma Web API com Datastore Emulator

Integrando uma Web API com Datastore Emulator,O custo elevado do faturamento associado aos projetos do Google Cloud Platform (GCP) é algo que sempre devemos ter em mente durante t...
kity的头像-拾光赋kity2年前
04712
Introduction to Python for Data Engineering-拾光赋

Introduction to Python for Data Engineering

Introduction to Python for Data Engineering, Yes hello! With increasing interest in data engineering expertise among organizations, we have seen a rise in the demand for data engin...
kity的头像-拾光赋kity3年前
0436
Data Engineering 102: Introduction to Python for Data Engineering.-拾光赋

Data Engineering 102: Introduction to Python for Data Engineering.

Data Engineering 102: Introduction to Python for Data Engineering.,Greetings to my dear readers, today we will be covering about Python for Data Engineering. If you read my article...
kity的头像-拾光赋kity3年前
04511
ETL Process - The ABC of the DATA Engineer-拾光赋

ETL Process – The ABC of the DATA Engineer

ETL Process - The ABC of the DATA Engineer,ETL - the process of extracting, transforming and loading data, also called streaming data process, is the foundation of data engineering...
kity的头像-拾光赋kity3年前
0447
You may not need Airflow…. yet-拾光赋

You may not need Airflow…. yet

You may not need Airflow…. yet,TL;DR: Airflow is robust and flexible, but complicated. If you are just starting to schedule data tasks, you may want to try more tailored solutions...
kity的头像-拾光赋kity5年前
0435