Unsurprisingly, very hot today as well. Ended up getting to the hub and staying there basically the whole day until like 8 PM.
- Started my day by watching the Karpathy GPT video. The concept of attention makes decent sense to me, I think I still get a bit caught up in the Q, K, and V matrix definitions.
- Grabbed lunch at Brooklyn DOP FastLife yet again, still a very solid spot
- Got back and had a coffee chat with a fellow recurser that built out a recipe generation / tracking app thing. He introduced me to a lot of cool LLM / agentic tooling like hatchet, for example. I hope that I can eventually contribute to the project.
- Attended the flash attention paper discussion group, and the concept that hit home here was that you don’t need to optimize algorithmic complexity to see substantial performance increases. There’s all kinds of situations where the same work done can be better optimized for your specific hardware. Flash attention does this really well and uses the “online softmax” function, which I really like as an addition to my interesting streaming algorithms (reservoir sampling, top k, etc.)
- Had another coffee chat with a cool recurser who’s getting their phd and has been all around the US and even a bit abroad. We chatted a bunch about the state of the technological world.
- Got back and saw that some people from the flash attention group discussion were still in the room and we ended up chatting about life and various things for ~3 hours. This will be one of my favorite memories from recurse.