Situatie
The Polars is a fast, efficient DataFrame library in Python, designed for processing large datasets with low memory usage and high performance. While Polars is more commonly used with CSV, Parquet, and JSON files, we can also work with Excel files, though this requires an additional setup as Polars does not have native support for Excel files.
Solutie
Pasi de urmat
pip install polars
One of the most important use-case of polars.read_excel over other manipulation libraries like Pandas is that they are performance reducers, meaning they degrade the performance as the system or dataset scales up. This method of importing Excel files to Polars is very efficient as they directly just reads the contents of an Excel file directly to the Polar Dataframe, bypassing the need to be dependent on another library like “Openxyl“.
# Read Excel file directly into a Polars DataFrame
df = pl.read_excel("On the Rise Bakery Business Challenge.xlsx")
# Display the DataFrame
print(df)
2. Step 1 – Load Excel Files Using Openpyxl
We can’t use Polars to read the excel files. Instead we can use the Openpyxl library to load the Excel file and extract the data like no of tables, grids size, formulas etc., which will later be converted into Dataframe of the Polars.
import openpyxl
# Load the Excel file
wb = openpyxl.load_workbook("On the Rise Bakery Business Challenge.xlsx")
# Select the active sheet
ws = wb.active
# Extract data from the Excel sheet
data = []
for row in ws.iter_rows(values_only=True):
data.append(list(row))
In the above Snippet:
- We used openpyxl.load_workbook() to load the Excel file.
- ws.iter_rows(values_only=True) helps to iterate over each rows and able to retrieve all the values embedded, excluding formulas and cell metadata.
3. Convert the Extracted Data into a Polars DataFrame
Once the data is extracted, for fast and efficient manipulations we use Polars to convert the list of lists into a DataFrame.
# Convert to Polars DataFrame
df = pl.DataFrame(data[1:], schema=data[0])
print(df)
Leave A Comment?