排序
The two versions of Parquet
The two versions of Parquet,A few days ago, the creators of DuckDB wrote the article: Query Engines: Gatekeepers of the Parquet File Format, which explained how the engines that pr...
Run PySpark Local Python Windows Notebook
Run PySpark Local Python Windows Notebook, Introduction PySpark is the Python API for Apache Spark, an open-source distributed computing system that enables fast, scalable data pro...
Compression algorithms in Parquet Java
Compression algorithms in Parquet Java,Apache Parquet is a columnar storage format optimized for analytical workloads, though it can also be used to store any type of structured da...
Effizientes Scrapen von JavaScript-Webseiten
Effizientes Scrapen von JavaScript-Webseiten, Die Möglichkeiten, JavaScript beim Web Crawling zu nutzen Statische Websites: Axios und Cheerio Lassen Sie uns das Crawlen einer stat...
Fünf Schritte zum Scraping mehrerer Bilder mit Python
Fünf Schritte zum Scraping mehrerer Bilder mit Python,Ob in der Marktforschung, E-Commerce-Produktauflistungen oder beim Erstellen von Datensätzen für maschinelles Lernen – die...
Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python
Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python, In today’s data-driven world, businesses rely on efficient data processing frameworks to g...
Scala vs. Java: The Superior Choice for Big Data and Machine Learning
Scala vs. Java: The Superior Choice for Big Data and Machine Learning, In the rapidly evolving landscapes of big data and machine learning, selecting the right programming language...
Data Visualisation Basics
Data Visualisation Basics, Why use data vis When you need to work with a new data source, with a huge amount of data, it can be important to use data visualization to understand th...
Working with Parquet files in Java using Carpet
Working with Parquet files in Java using Carpet,After some time working with Parquet files in Java using the Parquet Avro library, and studying how it worked, I concluded that desp...
Metadata for win — Apache Parquet
Metadata for win — Apache Parquet, You read the title right! Apache Parquet provisions best of the data properties to optimize your data processing engine capabilities. Some of th...
What to use parquet or CSV?
What to use parquet or CSV?, History of Parquet File: A Big Data Storage Revolution The Parquet file format has emerged as a dominant force in the realm of big data storage and ana...
Create a Custom Formatted CSV from Python Data
Create a Custom Formatted CSV from Python Data, Introduction In 2023, a friend of mine asked me to write a program to collect the data from an NFT collection on the NFTrade website...