Python script to extract Kindle highlights from My Clippings.txt
Status: #seedling
For all the book resources in this garden, more specifically - highlights and notes, I'm using Kindle and the My clippings.txt
file it generates. When looking to extract the quotes from Love Letters - Virginia Woolf and Vita Sackville-West, for example, I thought to write a script that would look for the title and export the quote + page location in a .MD format. It's flexibile to use and change the output format as needed.
This is the code I used:
import re
def extract_quotes(clippings_file="My Clippings.txt", output_file="virginia-vita.md", target_book_identifier="Virginia Woolf and Vita Sackville-West Love Letters ( etc.) (Z-Library)"):
"""
Extracts quotes from Virginia Woolf and Vita Sackville-West Love Letters
from a Kindle clippings file and saves them to a Markdown file.
Args:
clippings_file (str): The path to the Kindle clippings file.
output_file (str): The path to the output Markdown file.
target_book_identifier (str): The identifier for the target book.
"""
quotes_found = []
try:
with open(clippings_file, 'r', encoding='utf-8') as f:
content = f.read()
except FileNotFoundError:
print(f"Error: The file {clippings_file} was not found.")
return
except Exception as e:
print(f"Error reading {clippings_file}: {e}")
return
content = content.replace('\r\n', '\n').replace('\r', '\n')
clippings = content.split('==========\n')
for clip in clippings:
lines = [line.strip()
for line in clip.strip().split('\n') if line.strip()]
if not lines:
continue
book_title_line = lines[0]
if target_book_identifier.lower() in book_title_line.lower():
if len(lines) > 2 and "- Your Highlight on page" in lines[1]:
info_line = lines[1]
page_match = re.search(r"page (\d+)", info_line)
page_number = page_match.group(1) if page_match else "N/A"
if len(lines) > 2:
quote_text = ""
if len(lines) > 2:
quote_text = "\n".join(lines[2:])
if quote_text:
quotes_found.append({
"quote": quote_text.strip(),
"page": page_number
})
if not quotes_found:
print(f"No quotes found for '{target_book_identifier}'.")
return
try:
with open(output_file, 'w', encoding='utf-8') as f:
for item in quotes_found:
f.write(f"> {item['quote']}\n")
f.write(f"(pages {item['page']})\n\n")
print(
f"Successfully extracted {len(quotes_found)} quotes to {output_file}")
except Exception as e:
print(f"Error writing to {output_file}: {e}")
if __name__ == "__main__":
extract_quotes(clippings_file="My Clippings.txt",
output_file="virginia-vita.md",
target_book_identifier="Virginia Woolf and Vita Sackville-West Love Letters ( etc.) (Z-Library)")
If I will use this more extensively, I'm considering to create a web app that would handle this exact thing with various formatting and output options.