In summary, this Python script extracts product information from a Word document, processes it to create entries suitable for WordPress/WooCommerce (including variations), and saves the data as a CSV file you can import immediately to your eshop.
In more detail, the script performs the following tasks:
- Imports necessary libraries: It imports
pandas
for data manipulation,Document
fromdocx
for handling Word documents, anduuid
for generating unique identifiers. - Defines a function to generate unique SKUs: The
generate_unique_sku
function creates a unique SKU by combining a base name with a short UUID (a unique identifier) and an optional suffix. - Extracts product information from a Word document: The
extract_product_info_from_word
function reads a Word document and extracts product details such as title, description, type, SKU, colors, sizes, brand, and prices. It organizes this data into a list of dictionaries, where each dictionary represents a product. - Processes product information: The script iterates over the extracted product information. For each product:
- If the product is not variable, it generates a base SKU and prepares the product data.
- If the product is variable (i.e., it has variations like different colors or sizes), it creates separate entries for each variation, generating unique SKUs for each one.
- Creates a DataFrame: All the processed product data (including variations) is stored in a list of dictionaries, which is then converted into a
pandas
DataFrame. - Saves the data to a CSV file: Finally, the DataFrame is saved as a CSV file, ready for use with WooCommerce, an e-commerce platform. The file is saved at a specified path on the user’s system.
Download Word document : product_details_for_woocommerce.docx
Product details for 5 products extracted from the word document :
- Product Title (product_title)
- Product Type (product_type)
- Product SKU (product_sku)
- Product Brand (product_brand)
- Product Composition (product_composition)
- Product Description (product_description)
- Product Short Description (product_short_description)
- Product Colors (product colors) – multiple colors are separated with a comma and a space i.e. Blue, Black
- Product Sizes (product sizes) – multiple sizes are separated with a comma and a space i.e. S, M
- Product Category (product_category) – multiple categories are separated with a comma and a space i.e. Product Category 1, Product Category 2
- Product Regular Price (product_regular_price)
- Product Sale Price (product_sale_price)
Download CSV file : woocommerce_products_with_variations.csv
See Product 1 (variable), Product 2 (variable) and Product 3 (simple) after importing them temporarily with the generated CSV file to privatefashionfactory.com
Python script
# This Python script extracts product information from a Word document, processes it to create entries suitable
# for WordPress/WooCommerce (including variations), and saves the data as a CSV file you can import immediately to your eshop.
import pandas as pd
from docx import Document
import uuid # Importing uuid for unique SKU generation
def generate_unique_sku(base_name, suffix=“”):
“””Generate a unique SKU using a base name and an optional suffix.”””
unique_part = str(uuid.uuid4())[:8] # Generate a short unique identifier
return f“{base_name}–{unique_part}{suffix}“.upper()
def extract_product_info_from_word(doc_path):
doc = Document(doc_path)
products = []
product_info = {}
description_lines = []
short_description_lines = []
for para in doc.paragraphs:
line = para.text.strip()
if not line:
continue
if line.lower().startswith(“product_title”):
if product_info: # If there’s already product info, save it before starting a new one
# Join collected lines for description and short description with <br> tag
if description_lines:
product_info[“Product Description”] = “<br>”.join(description_lines)
if short_description_lines:
product_info[“Product Short Description”] = “<br>”.join(short_description_lines)
products.append(product_info) # Store the completed product
product_info = {}
description_lines = []
short_description_lines = []
current_key = “Product Title”
product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”
elif line.lower().startswith(“product_description”):
current_key = “Product Description”
elif line.lower().startswith(“product_type”):
current_key = “Product Type”
elif line.lower().startswith(“product_sku”):
current_key = “Product SKU”
description_lines = [] # Start collecting lines for description
elif line.lower().startswith(“product_short_description”):
current_key = “Product Short Description”
short_description_lines = [] # Start collecting lines for short description
elif line.lower().startswith(“product colors”):
current_key = “Product Colors”
product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”
elif line.lower().startswith(“product sizes”):
current_key = “Product Sizes”
elif line.lower().startswith(“product_brand”):
current_key = “Product Brand”
elif line.lower().startswith(“product_composition”):
current_key = “Product Composition”
product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”
elif line.lower().startswith(“product_category”):
current_key = “Product Category”
product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”
elif line.lower().startswith(“product_regular_price”):
current_key = “Product Regular Price”
product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”
elif line.lower().startswith(“product_sale_price”):
current_key = “Product Sale Price”
product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”
else:
if current_key == “Product Description”:
description_lines.append(line)
elif current_key == “Product Short Description”:
short_description_lines.append(line)
else:
product_info[current_key] = line
# Handle the last product in the document
if product_info:
if description_lines:
product_info[“Product Description”] = “<br>”.join(description_lines)
if short_description_lines:
product_info[“Product Short Description”] = “<br>”.join(short_description_lines)
products.append(product_info)
return products
# Corrected file paths
word_file_path = r‘c:\PythonPrograms\create_woocommerce_csv_from_word\product_details_for_woocommerce.docx’
# Extracting the product info from the Word file
products = extract_product_info_from_word(word_file_path)
all_data = []
for product_info in products:
# Generate or retrieve SKU for parent product
base_sku = product_info.get(“Product SKU”) or generate_unique_sku(product_info.get(“Product Title”, “PROD”))
# Base parent product data
parent_data = {
“Name”: product_info.get(“Product Title”, “”),
“Description”: product_info.get(“Product Description”, “”),
“Short Description”: product_info.get(“Product Short Description”, “”),
“Categories”: product_info.get(“Product Category”, “”),
“Type”: product_info.get(“Product Type”, “”),
“Published”: “1”,
“Visibility in catalog”: “visible”,
“In stock?”: “1”,
“Allow customer reviews?”: “1”,
“Tax status”: “taxable”,
“Images”: “”,
“SKU”: base_sku,
“Parent”: “”,
“Regular price”: product_info.get(“Product Regular Price”, “”),
“Sale price”: product_info.get(“Product Sale Price”, “”),
“Weight (kg)”: “”,
“Length (cm)”: “”,
“Width (cm)”: “”,
“Height (cm)”: “”,
“Stock”: “”,
“Backorders allowed?”: “0”,
“Sold individually?”: “0”
}
# Update attribute columns only if Attribute 1 value(s) is not null
attribute_1_values = product_info.get(“Product Colors”, “”).split(“, “)
if attribute_1_values and attribute_1_values[0]:
parent_data.update({
“Attribute 1 name”: “ΧΡΩΜΑ / COLOR”,
#”Attribute 1 value(s)”: ” | “.join(attribute_1_values),
“Attribute 1 value(s)”: “, “.join(attribute_1_values),
“Attribute 1 visible”: “1”,
“Attribute 1 global”: “1”
})
# Update attribute columns only if Attribute 2 value(s) is not null
attribute_2_values = product_info.get(“Product Sizes”, “”).split(“, “)
if attribute_2_values and attribute_2_values[0]:
parent_data.update({
“Attribute 2 name”: “ΜΕΓΕΘΟΣ / SIZE”,
#”Attribute 2 value(s)”: ” | “.join(attribute_2_values),
“Attribute 2 value(s)”: “, “.join(attribute_2_values),
“Attribute 2 visible”: “1”,
“Attribute 2 global”: “1”
})
# Update attribute columns only if Attribute 3 value(s) is not null
attribute_3_values = product_info.get(“Product Brand”, “”).split(“, “)
if attribute_3_values and attribute_3_values[0]:
parent_data.update({
“Attribute 3 name”: “ΜΑΡΚΑ / BRAND”,
#”Attribute 3 value(s)”: ” | “.join(attribute_3_values),
“Attribute 3 value(s)”: “, “.join(attribute_3_values),
“Attribute 3 visible”: “1”,
“Attribute 3 global”: “1”
})
# Update attribute columns only if Attribute 4 value(s) is not null
attribute_4_values = product_info.get(“Product Composition”, “”).split(“, “)
if attribute_4_values and attribute_4_values[0]:
parent_data.update({
“Attribute 4 name”: “ΣΥΝΘΕΣΗ / COMPOSITION”,
#”Attribute 4 value(s)”: ” | “.join(attribute_4_values),
“Attribute 4 value(s)”: “, “.join(attribute_4_values),
“Attribute 4 visible”: “1”,
“Attribute 4 global”: “1”
})
# Add the parent data to the list
all_data.append(parent_data)
# Only add variations if the product type is “variable”
if product_info.get(“Product Type”, “”).lower() == “variable”:
for color in attribute_1_values:
for size in attribute_2_values:
variation_sku = generate_unique_sku(base_sku, f“-{color}–{size}“)
variation = {
“Name”: “”, # Leave blank for variations
“Description”: “”, # Leave blank for variations
“Short Description”: “”, # Leave blank for variations
“Categories”: “”, # Leave blank for variations
“Attribute 1 name”: “ΧΡΩΜΑ / COLOR”,
“Attribute 1 value(s)”: color,
“Attribute 1 visible”: “1”,
“Attribute 1 global”: “1”,
“Attribute 2 name”: “ΜΕΓΕΘΟΣ / SIZE”,
“Attribute 2 value(s)”: size,
“Attribute 2 visible”: “1”,
“Attribute 2 global”: “1”,
“Type”: “variation”,
“Published”: “1”,
“Visibility in catalog”: “visible”,
“In stock?”: “1”,
“Allow customer reviews?”: “1”,
“Tax status”: “taxable”,
“Images”: “”,
“SKU”: variation_sku,
“Parent”: base_sku,
“Regular price”: product_info.get(“Product Regular Price”),
“Sale price”: product_info.get(“Product Sale Price”),
“Weight (kg)”: “”,
“Length (cm)”: “”,
“Width (cm)”: “”,
“Height (cm)”: “”,
“Stock”: “”,
“Backorders allowed?”: “0”,
“Sold individually?”: “0”
}
all_data.append(variation)
# Create DataFrame directly from list of dictionaries
df = pd.DataFrame(all_data)
# Path for saving the CSV file
csv_file_path_with_variations = r‘c:\PythonPrograms\create_woocommerce_csv_from_word\woocommerce_products_with_variations.csv’
df.to_csv(csv_file_path_with_variations, index=False)
csv_file_path_with_variations
Comments are closed.