Extracting Vocab from Books with BookToAnki (That Actually Fits Your Level)
Why I built a tool to stop memorizing generic lists and start pulling vocabulary from real reading material.
I read a lot of english books and the workflow around vocabulary has basically been broken forever. You find a word, you look it up, you forget it 3 days later. The obvious fix is Anki, but getting words into Anki from a novel is an incredibly tedious process.
Most tools solving this just throw you a generic "top 1000 words" deck. If you're lucky they let you paste some text and give you back an isolated list of definitions. Lately all these new apps are just thin UI wrappers around OpenAI APIs, taking your PDFs and epub files and sending them to who knows where.
I wanted something different so I just built it.
Stop memorizing lists
Memorizing frequency lists is brutal. It's so much easier when you have context. If you encounter a weird adjective in a sci-fi book, your brain attaches it to the scene, the character, the whole vibe of the chapter. A generic flashcard strips all that away.
BookToAnki pulls directly from your epub/pdf. It grabs the exact sentence you were reading. When the card pops up later, you don't just see the word—you see the exact moment you encountered it. Your brain actually has something to hook onto.
Local processing
I'm getting really tired of every productivity app just being an OpenAI pipeline. I read a mix of DRM-free ebooks, study blogs I bought, and personal notes. I don't want to upload my entire library to a consumer LLM.
BookToAnki runs the processing locally. It's not a generic AI assistant that can write poetry or summarize emails. It just extracts vocab. That's it. Your files stay private.
The real problem is filtering
My first version of this script was a total disaster. It just extracted every single word I didn't know well. I ran a 300-page book through it, got an 800-card Anki deck, looked at it once and never opened it again.
You can't learn everything. You have to filter.
I added targeting based on standard levels—CEFR, GRE, TOEFL etc. If I'm reading a dense history book, I might only grab GRE-level words. If it's a lighter novel, maybe CEFR C1. I just mess with the dial until the resulting list looks manageable. Usually under 100 cards. Anything more and I know I'm kidding myself.
Edit before you export
One thing I've noticed watching people use this—you have to actually curate the list before hitting the export button.
If a word is incredibly rare or overly specific to one random paragraph, just delete it. Let it stay in the book. Sometimes the context sentence is too long so I'll just trim it right in the UI. A lot of people feel this weird anxiety that they need to memorize every unknown word. You don't.
Anyway, I built this because the existing workflow sucked for how I actually read. If you're also tired of generic vocab apps and want something that actually ties into your reading habit, link is below. Let me know if it breaks.