Motivation

I started learning to read Hindi earlier this summer. I wasn’t really sure how to start, but my mom always told me that learning the alphabet is easy, since Hindi is a phonetic language. I realized that, one day, I’d like my kid(s) to be able to speak, read, and write Hindi, and rather than force them into some weekend classes, I’d like to be a part of that learning/teaching process. Of course, there’s also the benefit of being able to communicate with my family better. If you have family in different countries that speak different languages, I strongly urge you to consider this: How will you keep the familial ties strong when your parents aren’t around to bridge that gap?

I landed on using Anki as the main tool to learn the alphabet. It’s a spaced repetition tool (basically flashcards on steroids) that helps you memorize things. It’s a common tool used by med students, and my first introduction to it was through my girlfriend Kelly. It worked really well for her, so I decided to try it out.

Anki ended up working out really well for me. In a month or two, I was able to slowly but surely read through Hindi writing. I started looking into why it works so well, and the studies I read through mostly highlighted the benefits of spaced learning versus massed learning. Spaced learning is the process of, well, spacing out your learning over time. Massed learning is the equivalent of cramming everything into your brain last minute before your exam.

I thought about what other ways I could utilize this. Right now, I’m 25 and I have a bunch of time. I may not have that in the future, so I should probably get a head start on certain things sooner rather than later. I want to be a better engineer, so I had this idea of asking ChatGPT a System Design question every day. However, after giving it more thought, I realized I could just automate this and put it on my blog so that I could go back and see my previous answers, answer new questions, and have hints ready for myself on demand.

Keeping It Free

As with other things I’ve worked on, I’d not only like to generalize this idea so others can use it, but also keep it free. This means keeping it open source, and having it running myself for free. I found out about this service called Arli AI, that has a free tier for accessing LLMs via API. Most of the large offerings for similar are pay-as-you-go models. I wanted to leverage this, although I had some problems with reliability of the service, so I switched over to using groq. As with the rest of this site, I use Github Actions for most of the generation infrastructure.

Decisions / General Overview

This one was pretty straight-forward to make for me, since I had a general idea and because I’ve been working on this blog. The original idea was to have some code that accesses groq, passes in relevant information, and generates a question and hint in markdown. I started having trouble with getting the LLMs to generate properly formatted stuff though. It would often mess up the formatting, and it wasn’t reliable enough, which would be a problem since I needed this to be generating properly every day.

This is something that I realized only when really working with these LLM apis. It makes sense looking back on it, but when you’re accessing these LLMs you should aim to perform something very similar to “predicate pushdown.” You should ask your LLM to only generate the very base amount of data needed, aka you should ask the LLM to give you ONLY the indeterministic stuff. In my (simplified) case, that’s a sentence for the question, and a sentence for the hint. The formatting is deterministic, so it’s simply a matter of string substitution at that point.

Overall, it’s a pretty simple project, but it gives me a lot of joy. Sometimes I forget it’s even generating, and I’ll look at my notes app and find that it’s just been going. I’ll try to catch up and answer questions when I get a chance to. Some nice-to-haves that I thought about would be the ability to make sure the same questions don’t keep getting generated (I actually completed this one and it’s done by injecting the last 5 questions as a part of the prompt), and a way for the LLM to either answer yesterday’s question if I didn’t get to it, or grade my answer if I did.

You can find the notes that have been generated by this on my blog, over here!