bigdata共87篇
The two versions of Parquet-拾光赋

The two versions of Parquet

The two versions of Parquet,A few days ago, the creators of DuckDB wrote the article: Query Engines: Gatekeepers of the Parquet File Format, which explained how the engines that pr...
kity的头像-拾光赋kity36天前
0426
Run PySpark Local Python Windows Notebook-拾光赋

Run PySpark Local Python Windows Notebook

Run PySpark Local Python Windows Notebook, Introduction PySpark is the Python API for Apache Spark, an open-source distributed computing system that enables fast, scalable data pro...
kity的头像-拾光赋kity1个月前
0455
Compression algorithms in Parquet Java-拾光赋

Compression algorithms in Parquet Java

Compression algorithms in Parquet Java,Apache Parquet is a columnar storage format optimized for analytical workloads, though it can also be used to store any type of structured da...
kity的头像-拾光赋kity1个月前
0317
Effizientes Scrapen von JavaScript-Webseiten-拾光赋

Effizientes Scrapen von JavaScript-Webseiten

Effizientes Scrapen von JavaScript-Webseiten, Die Möglichkeiten, JavaScript beim Web Crawling zu nutzen Statische Websites: Axios und Cheerio Lassen Sie uns das Crawlen einer stat...
kity的头像-拾光赋kity4个月前
04613
Fünf Schritte zum Scraping mehrerer Bilder mit Python-拾光赋

Fünf Schritte zum Scraping mehrerer Bilder mit Python

Fünf Schritte zum Scraping mehrerer Bilder mit Python,Ob in der Marktforschung, E-Commerce-Produktauflistungen oder beim Erstellen von Datensätzen für maschinelles Lernen – die...
kity的头像-拾光赋kity4个月前
0316
Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python-拾光赋

Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python

Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python, In today’s data-driven world, businesses rely on efficient data processing frameworks to g...
kity的头像-拾光赋kity5个月前
03114
Scala vs. Java: The Superior Choice for Big Data and Machine Learning-拾光赋

Scala vs. Java: The Superior Choice for Big Data and Machine Learning

Scala vs. Java: The Superior Choice for Big Data and Machine Learning, In the rapidly evolving landscapes of big data and machine learning, selecting the right programming language...
kity的头像-拾光赋kity5个月前
0396
Data Visualisation Basics-拾光赋

Data Visualisation Basics

Data Visualisation Basics, Why use data vis When you need to work with a new data source, with a huge amount of data, it can be important to use data visualization to understand th...
kity的头像-拾光赋kity6个月前
04612
Working with Parquet files in Java using Carpet-拾光赋

Working with Parquet files in Java using Carpet

Working with Parquet files in Java using Carpet,After some time working with Parquet files in Java using the Parquet Avro library, and studying how it worked, I concluded that desp...
kity的头像-拾光赋kity9个月前
04415
Metadata for win — Apache Parquet-拾光赋

Metadata for win — Apache Parquet

Metadata for win — Apache Parquet, You read the title right! Apache Parquet provisions best of the data properties to optimize your data processing engine capabilities. Some of th...
kity的头像-拾光赋kity10个月前
04412
What to use parquet or CSV?-拾光赋

What to use parquet or CSV?

What to use parquet or CSV?, History of Parquet File: A Big Data Storage Revolution The Parquet file format has emerged as a dominant force in the realm of big data storage and ana...
kity的头像-拾光赋kity10个月前
03411
Create a Custom Formatted CSV from Python Data-拾光赋

Create a Custom Formatted CSV from Python Data

Create a Custom Formatted CSV from Python Data, Introduction In 2023, a friend of mine asked me to write a program to collect the data from an NFT collection on the NFTrade website...
kity的头像-拾光赋kity11个月前
04910