ALL AUTOMATE WORKFLOW EVERYWHERE BUSINESS ENGLISH ARTICLES PYTHON

A Python script that extracts product information from a Word document, processes it to create entries suitable for WordPress/WooCommerce (including variations), and saves the data as a CSV file

In summary, this Python script extracts product information from a Word document, processes it to create entries suitable for WordPress/WooCommerce (including variations), and saves the data as a CSV file you can import immediately to your eshop.

In more detail, the script performs the following tasks:

  1. Imports necessary libraries: It imports pandas for data manipulation, Document from docx for handling Word documents, and uuid for generating unique identifiers.
  2. Defines a function to generate unique SKUs: The generate_unique_sku function creates a unique SKU by combining a base name with a short UUID (a unique identifier) and an optional suffix.
  3. Extracts product information from a Word document: The extract_product_info_from_word function reads a Word document and extracts product details such as title, description, type, SKU, colors, sizes, brand, and prices. It organizes this data into a list of dictionaries, where each dictionary represents a product.
  4. Processes product information: The script iterates over the extracted product information. For each product:
    • If the product is not variable, it generates a base SKU and prepares the product data.
    • If the product is variable (i.e., it has variations like different colors or sizes), it creates separate entries for each variation, generating unique SKUs for each one.
  5. Creates a DataFrame: All the processed product data (including variations) is stored in a list of dictionaries, which is then converted into a pandas DataFrame.
  6. Saves the data to a CSV file: Finally, the DataFrame is saved as a CSV file, ready for use with WooCommerce, an e-commerce platform. The file is saved at a specified path on the user’s system.

Download Word document : product_details_for_woocommerce.docx

word_file_screenshot

Product details for 5 products extracted from the word document :

  • Product Title (product_title)
  • Product Type (product_type)
  • Product SKU (product_sku)
  • Product Brand (product_brand)
  • Product Composition (product_composition)
  • Product Description (product_description)
  • Product Short Description (product_short_description)
  • Product Colors (product colors) – multiple colors are separated with a comma and a space i.e. Blue, Black
  • Product Sizes (product sizes) – multiple sizes are separated with a comma and a space i.e. S, M
  • Product Category (product_category) – multiple categories are separated with a comma and a space i.e. Product Category 1, Product Category 2
  • Product Regular Price (product_regular_price)
  • Product Sale Price (product_sale_price)

Follow on LinkedIn

Download CSV file : woocommerce_products_with_variations.csv

csv_file_screeshotSee Product 1 (variable), Product 2 (variable) and Product 3 (simple) after importing them temporarily with the generated CSV file to privatefashionfactory.com

Product 1

 

Product 2

Product 3

Python script

# This Python script extracts product information from a Word document, processes it to create entries suitable 

# for WordPress/WooCommerce (including variations), and saves the data as a CSV file you can import immediately to your eshop. 

import pandas as pd

from docx import Document

import uuid  # Importing uuid for unique SKU generation

 

def generate_unique_sku(base_namesuffix=“”):

    “””Generate a unique SKU using a base name and an optional suffix.”””

    unique_part = str(uuid.uuid4())[:8]  # Generate a short unique identifier

    return f{base_name}{unique_part}{suffix}.upper()    

 

def extract_product_info_from_word(doc_path):

    doc = Document(doc_path)

    products = []

    product_info = {}

    description_lines = []

    short_description_lines = []

 

    for para in doc.paragraphs:

        line = para.text.strip()

        if not line:

            continue

        

        if line.lower().startswith(“product_title”):

            if product_info:  # If there’s already product info, save it before starting a new one

                # Join collected lines for description and short description with <br> tag

                if description_lines:

                    product_info[“Product Description”] = “<br>”.join(description_lines)

                if short_description_lines:

                    product_info[“Product Short Description”] = “<br>”.join(short_description_lines)

                products.append(product_info)  # Store the completed product

                product_info = {}

                description_lines = []

                short_description_lines = []

 

            current_key = “Product Title”

            product_info[current_key] = line.split(“:”1)[1].strip() if “:” in line else “”

        elif line.lower().startswith(“product_description”):

            current_key = “Product Description”

        elif line.lower().startswith(“product_type”):

            current_key = “Product Type”

        elif line.lower().startswith(“product_sku”):

            current_key = “Product SKU”    

            description_lines = []  # Start collecting lines for description

        elif line.lower().startswith(“product_short_description”):

            current_key = “Product Short Description”

            short_description_lines = []  # Start collecting lines for short description

        elif line.lower().startswith(“product colors”):

            current_key = “Product Colors”

            product_info[current_key] = line.split(“:”1)[1].strip() if “:” in line else “”

        elif line.lower().startswith(“product sizes”):

            current_key = “Product Sizes”

        elif line.lower().startswith(“product_brand”):

            current_key = “Product Brand”

        elif line.lower().startswith(“product_composition”):

            current_key = “Product Composition”        

            product_info[current_key] = line.split(“:”1)[1].strip() if “:” in line else “”

        elif line.lower().startswith(“product_category”):

            current_key = “Product Category”

            product_info[current_key] = line.split(“:”1)[1].strip() if “:” in line else “”

        elif line.lower().startswith(“product_regular_price”):

            current_key = “Product Regular Price”

            product_info[current_key] = line.split(“:”1)[1].strip() if “:” in line else “”

        elif line.lower().startswith(“product_sale_price”):

            current_key = “Product Sale Price”

            product_info[current_key] = line.split(“:”1)[1].strip() if “:” in line else “”

        else:

            if current_key == “Product Description”:

                description_lines.append(line)

            elif current_key == “Product Short Description”:

                short_description_lines.append(line)

            else:

                product_info[current_key] = line

 

    # Handle the last product in the document

    if product_info:

        if description_lines:

            product_info[“Product Description”] = “<br>”.join(description_lines)

        if short_description_lines:

            product_info[“Product Short Description”] = “<br>”.join(short_description_lines)

        products.append(product_info)

 

    return products

 

# Corrected file paths

word_file_path = r‘c:\PythonPrograms\create_woocommerce_csv_from_word\product_details_for_woocommerce.docx’

 

# Extracting the product info from the Word file

products = extract_product_info_from_word(word_file_path)

 

all_data = []

 

for product_info in products:

    # Generate or retrieve SKU for parent product

    base_sku = product_info.get(“Product SKU”or generate_unique_sku(product_info.get(“Product Title”“PROD”))

    

    # Base parent product data

    parent_data = {

        “Name”product_info.get(“Product Title”“”),

        “Description”product_info.get(“Product Description”“”),

        “Short Description”product_info.get(“Product Short Description”“”),

        “Categories”product_info.get(“Product Category”“”),

        “Type”product_info.get(“Product Type”“”),

        “Published”“1”,

        “Visibility in catalog”“visible”,

        “In stock?”“1”,

        “Allow customer reviews?”“1”,

        “Tax status”“taxable”,

        “Images”“”,

        “SKU”base_sku,

        “Parent”“”,

        “Regular price”product_info.get(“Product Regular Price”“”),

        “Sale price”product_info.get(“Product Sale Price”“”),

        “Weight (kg)”“”,

        “Length (cm)”“”,

        “Width (cm)”“”,

        “Height (cm)”“”,

        “Stock”“”,

        “Backorders allowed?”“0”,

        “Sold individually?”“0”

    }

 

    # Update attribute columns only if Attribute 1 value(s) is not null

    attribute_1_values = product_info.get(“Product Colors”“”).split(“, “)

    if attribute_1_values and attribute_1_values[0]:

        parent_data.update({

            “Attribute 1 name”ΧΡΩΜΑ / COLOR”,

            #”Attribute 1 value(s)”: ” | “.join(attribute_1_values),

            “Attribute 1 value(s)”“, “.join(attribute_1_values),

            “Attribute 1 visible”“1”,

            “Attribute 1 global”“1”

        })

 

    # Update attribute columns only if Attribute 2 value(s) is not null

    attribute_2_values = product_info.get(“Product Sizes”“”).split(“, “)

    if attribute_2_values and attribute_2_values[0]:

        parent_data.update({

            “Attribute 2 name”ΜΕΓΕΘΟΣ / SIZE”,

            #”Attribute 2 value(s)”: ” | “.join(attribute_2_values),

            “Attribute 2 value(s)”“, “.join(attribute_2_values),

            “Attribute 2 visible”“1”,

            “Attribute 2 global”“1”

        })

 

    # Update attribute columns only if Attribute 3 value(s) is not null

    attribute_3_values = product_info.get(“Product Brand”“”).split(“, “)

    if attribute_3_values and attribute_3_values[0]:

        parent_data.update({

            “Attribute 3 name”ΜΑΡΚΑ / BRAND”,

            #”Attribute 3 value(s)”: ” | “.join(attribute_3_values),

            “Attribute 3 value(s)”“, “.join(attribute_3_values),

            “Attribute 3 visible”“1”,

            “Attribute 3 global”“1”

        })

    

    # Update attribute columns only if Attribute 4 value(s) is not null

    attribute_4_values = product_info.get(“Product Composition”“”).split(“, “)

    if attribute_4_values and attribute_4_values[0]:

        parent_data.update({

            “Attribute 4 name”ΣΥΝΘΕΣΗ / COMPOSITION”,

            #”Attribute 4 value(s)”: ” | “.join(attribute_4_values),

            “Attribute 4 value(s)”“, “.join(attribute_4_values),

            “Attribute 4 visible”“1”,

            “Attribute 4 global”“1”

        })

 

    # Add the parent data to the list

    all_data.append(parent_data)

 

    # Only add variations if the product type is “variable”

    if product_info.get(“Product Type”“”).lower() == “variable”:

        for color in attribute_1_values:

            for size in attribute_2_values:

                variation_sku = generate_unique_sku(base_skuf“-{color}{size})

                variation = {

                    “Name”“”,  # Leave blank for variations

                    “Description”“”,  # Leave blank for variations

                    “Short Description”“”,  # Leave blank for variations

                    “Categories”“”,  # Leave blank for variations         

                    “Attribute 1 name”ΧΡΩΜΑ / COLOR”,

                    “Attribute 1 value(s)”color,

                    “Attribute 1 visible”“1”,

                    “Attribute 1 global”“1”,

                    “Attribute 2 name”ΜΕΓΕΘΟΣ / SIZE”,

                    “Attribute 2 value(s)”size,

                    “Attribute 2 visible”“1”,

                    “Attribute 2 global”“1”,

                    “Type”“variation”,

                    “Published”“1”,

                    “Visibility in catalog”“visible”,

                    “In stock?”“1”,

                    “Allow customer reviews?”“1”,

                    “Tax status”“taxable”,

                    “Images”“”,              

                    “SKU”variation_sku,

                    “Parent”base_sku,

                    “Regular price”product_info.get(“Product Regular Price”),

                    “Sale price”product_info.get(“Product Sale Price”),

                    “Weight (kg)”“”,

                    “Length (cm)”“”,

                    “Width (cm)”“”,

                    “Height (cm)”“”,

                    “Stock”“”,

                    “Backorders allowed?”“0”,

                    “Sold individually?”“0”

                }

                all_data.append(variation)

 

# Create DataFrame directly from list of dictionaries

df = pd.DataFrame(all_data)

 

# Path for saving the CSV file

csv_file_path_with_variations = r‘c:\PythonPrograms\create_woocommerce_csv_from_word\woocommerce_products_with_variations.csv’

df.to_csv(csv_file_path_with_variationsindex=False)

 

csv_file_path_with_variations

Views: 8

Comments are closed.

Pin It