Python script to extract Kindle highlights from My Clippings.txt

Status: #seedling

#python #tutorial #kindle

For all the book resources in this garden, more specifically - highlights and notes, I'm using Kindle and the My clippings.txt file it generates. When looking to extract the quotes from Love Letters - Virginia Woolf and Vita Sackville-West, for example, I thought to write a script that would look for the title and export the quote + page location in a .MD format. It's flexibile to use and change the output format as needed.

This is the code I used:

import re


def extract_quotes(clippings_file="My Clippings.txt", output_file="virginia-vita.md", target_book_identifier="Virginia Woolf and Vita Sackville-West Love Letters ( etc.) (Z-Library)"):
    """
    Extracts quotes from Virginia Woolf and Vita Sackville-West Love Letters
    from a Kindle clippings file and saves them to a Markdown file.

    Args:
        clippings_file (str): The path to the Kindle clippings file.
        output_file (str): The path to the output Markdown file.
        target_book_identifier (str): The identifier for the target book.
    """

    quotes_found = []

    try:
        with open(clippings_file, 'r', encoding='utf-8') as f:
            content = f.read()
    except FileNotFoundError:
        print(f"Error: The file {clippings_file} was not found.")
        return
    except Exception as e:
        print(f"Error reading {clippings_file}: {e}")
        return

    content = content.replace('\r\n', '\n').replace('\r', '\n')
    clippings = content.split('==========\n')

    for clip in clippings:
        lines = [line.strip()
                 for line in clip.strip().split('\n') if line.strip()]
        if not lines:
            continue

        book_title_line = lines[0]

        if target_book_identifier.lower() in book_title_line.lower():
            if len(lines) > 2 and "- Your Highlight on page" in lines[1]:
                info_line = lines[1]

                page_match = re.search(r"page (\d+)", info_line)
                page_number = page_match.group(1) if page_match else "N/A"

                if len(lines) > 2:
                    quote_text = ""
                    if len(lines) > 2:
                        quote_text = "\n".join(lines[2:])

                    if quote_text:
                        quotes_found.append({
                            "quote": quote_text.strip(),
                            "page": page_number
                        })

    if not quotes_found:
        print(f"No quotes found for '{target_book_identifier}'.")
        return

    try:
        with open(output_file, 'w', encoding='utf-8') as f:
            for item in quotes_found:
                f.write(f"> {item['quote']}\n")
                f.write(f"(pages {item['page']})\n\n")
        print(
            f"Successfully extracted {len(quotes_found)} quotes to {output_file}")
    except Exception as e:
        print(f"Error writing to {output_file}: {e}")


if __name__ == "__main__":
    extract_quotes(clippings_file="My Clippings.txt",
                   output_file="virginia-vita.md",
                   target_book_identifier="Virginia Woolf and Vita Sackville-West Love Letters ( etc.) (Z-Library)")

If I will use this more extensively, I'm considering to create a web app that would handle this exact thing with various formatting and output options.