Use of chatGPT and YouTube to become a Data Analyst – Χρήση του chatGPT και του YouTube για να κάνει κάποιος ανάλυση δεδομένων
Code file: app.py
Folder: C:\PythonPrograms\data_analysis_with_chatGPT
1) Merge multiple excel files – Συγχώνευση πολλών αρχείων excel
2) Clean data – Καθαρισμός δεδομένων
3) Create two excel reports (Total Revenue, Expenses and Profit by Category) and (Total Revenue, Expenses and Profit by Country) – Δημιουργία δύο αρχείων excel (category_report.xlsx and country_report.xlsx)
4) Add a chart to category_report.xlsx – Προσθήκη γραφήματος στο αρχείο excel category_report.xlsx
5) Add a chart to country_report.xlsx – Προσθήκη γραφήματος στο αρχείο excel country_report.xlsx
6) Build two interactive plots (category_chart.html and country_chart.html)
7) Create a streamlit dashboard with 2 graphs – Δημιουργία ενός πίνακα προβολής και διαχείρισης streamlit με 2 γραφήματα
με τη βοήθεια της γλώσσας Python – with Python.
Changed code by Trifonas Papadopoulos:
#Code starts here—————————
import xlsxwriter
import pandas as pd
import os
import plotly.express as px
import streamlit as st
# Define the path to the data folder
data_folder = ‘data’
# Get a list of all the Excel files in the data folder
excel_files = [f for f in os.listdir(data_folder) if f.endswith(‘.xlsx’)]
# Create an empty dataframe to hold the combined data
combined_data = pd.DataFrame()
# Loop through each Excel file and append its data to the combined_data dataframe
for file in excel_files:
# Load the Excel file into a dataframe
df = pd.read_excel(os.path.join(data_folder, file))
# Add a column to the dataframe to indicate the source file
df[‘Source’] = file
# Append the data to the combined_data dataframe
combined_data = pd.concat([combined_data, df], ignore_index=True)
# Drop duplicate rows
combined_data.drop_duplicates(inplace=True)
# Handle missing values
combined_data.dropna(inplace=True)
# Print the combined data
print(combined_data)
# Category Statistics
# Calculate total revenue, expenses, and profit for each category
category_data = combined_data.groupby(‘Category’).agg({‘Revenue’: ‘sum’, ‘Expenses’: ‘sum’})
category_data[‘Profit’] = category_data[‘Revenue’] – category_data[‘Expenses’]
# Print the category data
print(category_data)
# Create an interactive chart using Plotly
fig = px.bar(category_data, x=category_data.index, y=[‘Revenue’, ‘Expenses’, ‘Profit’], barmode=’group’, title=’Total Revenue, Expenses and Profit by Category’)
# Save the chart as an HTML file
chart_name = ‘category_chart.html’
fig.write_html(chart_name)
print(f’Chart saved to {chart_name}.’)
# Create an Excel report
report_name = ‘category_report.xlsx’
workbook = xlsxwriter.Workbook(report_name)
worksheet = workbook.add_worksheet()
# Write the category data to the report
worksheet.write(0, 0, ‘Category’)
worksheet.write(0, 1, ‘Revenue’)
worksheet.write(0, 2, ‘Expenses’)
worksheet.write(0, 3, ‘Profit’)
row = 1
for index, row_data in category_data.iterrows():
worksheet.write(row, 0, index)
worksheet.write(row, 1, row_data[‘Revenue’])
worksheet.write(row, 2, row_data[‘Expenses’])
worksheet.write(row, 3, row_data[‘Profit’])
row += 1
# Create a chart representing the data
chart = workbook.add_chart({‘type’: ‘column’})
chart.add_series({
‘name’: ‘Revenue’,
‘categories’: [‘Sheet1’, 1, 0, row – 1, 0],
‘values’: [‘Sheet1’, 1, 1, row – 1, 1],
})
chart.add_series({
‘name’: ‘Expenses’,
‘categories’: [‘Sheet1’, 1, 0, row – 1, 0],
‘values’: [‘Sheet1’, 1, 2, row – 1, 2],
})
chart.add_series({
‘name’: ‘Profit’,
‘categories’: [‘Sheet1’, 1, 0, row – 1, 0],
‘values’: [‘Sheet1’, 1, 3, row – 1, 3],
})
chart.set_title({‘name’: ‘Total Revenue, Expenses and Profit by Category’})
chart.set_x_axis({‘name’: ‘Category’})
chart.set_y_axis({‘name’: ‘Amount’})
chart.set_legend({‘position’: ‘bottom’})
worksheet.insert_chart(‘F1′, chart)
# Close the workbook
workbook.close()
print(f’Report saved to {report_name}.’)
# Country Statistics
# Calculate total revenue, expenses, and profit for each category
country_data = combined_data.groupby(‘Country’).agg({‘Revenue’: ‘sum’, ‘Expenses’: ‘sum’})
country_data[‘Profit’] = country_data[‘Revenue’] – country_data[‘Expenses’]
# Print the category data
print(country_data)
# Create an interactive chart using Plotly
fig = px.bar(country_data, x=country_data.index, y=[‘Revenue’, ‘Expenses’, ‘Profit’], barmode=’group’, title=’Total Revenue, Expenses and Profit by Country’)
# Save the chart as an HTML file
chart_name = ‘country_chart.html’
fig.write_html(chart_name)
print(f’Chart saved to {chart_name}.’)
# Create an Excel report
report_name = ‘country_report.xlsx’
workbook = xlsxwriter.Workbook(report_name)
worksheet = workbook.add_worksheet()
# Write the category data to the report
worksheet.write(0, 0, ‘Country’)
worksheet.write(0, 1, ‘Revenue’)
worksheet.write(0, 2, ‘Expenses’)
worksheet.write(0, 3, ‘Profit’)
row = 1
for index, row_data in country_data.iterrows():
worksheet.write(row, 0, index)
worksheet.write(row, 1, row_data[‘Revenue’])
worksheet.write(row, 2, row_data[‘Expenses’])
worksheet.write(row, 3, row_data[‘Profit’])
row += 1
# Create a chart representing the data
chart = workbook.add_chart({‘type’: ‘column’})
chart.add_series({
‘name’: ‘Revenue’,
‘categories’: [‘Sheet1’, 1, 0, row – 1, 0],
‘values’: [‘Sheet1’, 1, 1, row – 1, 1],
})
chart.add_series({
‘name’: ‘Expenses’,
‘categories’: [‘Sheet1’, 1, 0, row – 1, 0],
‘values’: [‘Sheet1’, 1, 2, row – 1, 2],
})
chart.add_series({
‘name’: ‘Profit’,
‘categories’: [‘Sheet1’, 1, 0, row – 1, 0],
‘values’: [‘Sheet1’, 1, 3, row – 1, 3],
})
chart.set_title({‘name’: ‘Total Revenue, Expenses and Profit by Country’})
chart.set_x_axis({‘name’: ‘Country’})
chart.set_y_axis({‘name’: ‘Amount’})
chart.set_legend({‘position’: ‘bottom’})
worksheet.insert_chart(‘F1′, chart)
# Close the workbook
workbook.close()
print(f’Report saved to {report_name}.’)
# Create a Streamlit dashboard
st.set_page_config(page_title=’Total Revenue, Expenses and Profit’, page_icon=’:chart_with_upwards_trend:’)
st.title(‘Total Revenue, Expenses and Profit’)
st.subheader(‘Grouped Data by Category’)
# Create an interactive chart using Plotly and display it in the dashboard
fig = px.bar(category_data, x=category_data.index, y=[‘Revenue’, ‘Expenses’, ‘Profit’], barmode=’group’)
st.plotly_chart(fig)
# Add a second graph
st.subheader(‘Grouped Data by Country’)
# Create an interactive chart using Plotly and display it in the dashboard for second graph
fig2 = px.bar(country_data, x=country_data.index, y=[‘Revenue’, ‘Expenses’, ‘Profit’], barmode=’group’)
st.plotly_chart(fig2)
#Code ends here—————————
This code has been changed. But it has been based heavily on code and instructions that you can find in this video on YouTube : I Quit Coding – How I use ChatGPT instead as Data Analyst https://www.youtube.com/watch?v=IAWMMi_VFOI
YouTube Video credit : Coding is Fun https://www.youtube.com/@CodingIsFun
Music track: A Positive Direction by Aylex
Source: https://lnkd.in/d4gBFZEU
Royalty Free Music for Video (Safe)
#mindstormGR #chatGPT #coding #python #data_analysis / www.mindstorm.gr
Comments are closed.