I’ve been using Claude Code a bit on and off for a couple projects, mainly for fixing research data workflow (fixing my makefile, writing better bash utilities), but I hadn’t really fully vibecoded something before. For research, this seems obviously reckless; the goal is always to get a good answer, ideally something others can reproduce. At a minimum that means something you understand yourself and can stand behind. Mistakes are always an issue in any work, but in my opinion they should always be your own.

Some Background on Grab Cafe

Over at the discord server we have an annual tradition of helping graduate student applicants transcend the lonely and obsessive refreshing of emails, webpages, and gradcafe by instead scraping sources of updates on graduate admissions and automatically posting them to the server. That way, to the extent that they are having depressive meltdowns or anxiety attacks, at least they are doing so among friends.

This all originated with user smh, who has gone onto great things in a PhD that I won’t mention for doxxing reasons. Long before I ran this server, she suggested that I could just host this on Google Cloud and get it up and running. Kevin (the owner at the time) was kind enough to let me do this. This is a screenshot of me, aero, Kevin, and smh at the time:

Graduate admissions is lonely, and often people who surround you that are familiar with how it works are themselves applying. People are hesitant to rely on them for support because they usually applied to the same exact places with the hopes of getting in! They might be in the awkward (though happy) position of being admitted somewhere despite their close friend not getting admitted, and the whole thing is awful.

In any case, I’m of the firm opinion that having a place to talk and share for the nervous nellies is good, and many of the best contributors to the server joined during admissions season for exactly this purpose. On a sidenote, it’s pretty cool to see a few years in that people who were admitted to PhDs a couple years ago are in the midst of research, and often doing cool work! What’s a little weird is now when I go visit schools or see people at visit days, they often know me because they had anonymous lurker accounts on the discord server. Swear to god at least one a year.

Ok, well with that aside, I want to talk through how this used to work and then get into what I wanted to set up and then talk about what actually happened. The last few years I had a system that was largely duct tape. I am primarily an R programmer, owing to a strange history of learning to program in my first job but not being allowed to use Python (since that was a real programming language only special departments go access but because R was a business tool I could use it). I’m no Hadley Wickham, but I’m a very good R programmer so I wrote some code in rvest that scraped gradcafe and saved new posts to a csv file after checking for uniqueness against the last 30 or so saved posts. This ran every minute thanks to cron. After this ran, a discord bot hacked together in Python would look for things it hadn’t shared yet, and post them to the discord.

This is obviously bad because sometimes they would get out of sync and double-post and whatever, but I am a very mediocre Python programmer, and the payoffs to learning the Discord bot SDK and reimplementing my webscraper are very low for me or otherwise fixing this with some kind of better database engineering were very low. The last few years I would get it working, kick it off, get the formatting right, get annoyed at everything, and tell myself I’d re-implement the webscraping in native Python next year after I learned beautifulsoup. But life is busy, and as someone who is in graduate school and not anxiously awaiting results I really just didn’t want to bother, so I never did.

I named this bot “Grab Cafe” because it just grabbed the grad cafe posts and put it on the Discord server, and since 2020 it has largely remained exactly the same.

The Goals

I had two goals for the project.

First, I wanted to take the existing repo and move it into sqlite instead of a csv file. Text fields are really annoying in csvs and different parsers do things differently, so moving between R and Python sometimes got a little weird. SQLite is the obvious choice for this, we can have something like a flag for whether the bot has posted it to discord, the bot can take all records and write it to the db, when it posts it can just update the flag.

Second, I wanted to play around with letting users ping the bot and ask @grab-cafe is this true? and get some kind of meaningful results around admissions. I didn’t want to do this with calls to a powerful LLM because that seems like a waste of compute, but I did want users to have the experience where they didn’t need to write some code to get a simple answer. I also wanted users to be able to just ask the bot, have it interpret the context, and respond meaningfully.

The Design Approach

I will say that I didn’t just let Claude go ham; I had a very specific architecture in mind for the project. First, I wanted this to all reference actual data, so I ran the scraper on the full history of gradcafe. We would have two LLMs, one that handles the data and one that summarizes. The first “SQL Generator” bot has one job: take user input, translate into sql, and return the results. The second “Summarizer” bot takes those results, has context about PhD admissions and the database schema in the prompt, and summarizes them in natural language.

I gave the SQL Generator the identify of Gary, and the summarizer the identity of Beatriz Viterbo, named for Borges’s lost love in his story El Aleph. Gary is told the schema of the database, the user query, and given a few examples of how to answer questions in the prompt, after being informed he is an expert SQL engineer, while Beatriz is told that she is a helpful librarian in charge of a set of data on graduate admissions.

The initial goal was to have this entirely self-hosted, running a very small language model on a CPU server. This was a mistake for I’ll talk about this a bit later.

Claude

I am using Claude Code with no attached skills, mostly because I just haven’t bothered to learn what those are. Instead of giving the whole list of what to do, I broke it into steps. here’s what I did in order:

  1. Tell Claude to familiarize themselves with the original repo. Because we are going full vibe-code on this, I’m letting it just make the edits that it wants. The original is committed to git, and let’s be honest this doesn’t really matter very much, so no harm no foul.
  2. First, we just worked on fixing the scraper. I had Claude look at all of the original code, and told it that we wanted to port this scraper to Python and include it in the code for the existing Discord bot.
  3. Then I had Claude write a series of tests for the scraper to make sure it’s working, and then had it run it over a few pages of gradcafe.com to see if it was working.
  4. Finally, I had Claude create a sqlite database containing the information that it was scraping, giving it a little guidance on how I wanted the database to look.
  5. Once that all looked good, I had Claude run the scraper over the whole history of gradcafe and populate the database.

Here I’ll pause to talk about my experiences with Claude. First, man it’s impressive. A few years ago I would not have guessed that any automatic code generation would be this good. The barrier to entry to custom software has dropped MASSIVELY over the last year. To be honest, that seems super exciting. HOWEVER, Claude is not a fantastic architect. It’s more like a super fast (near instant) junior dev. To be clear, that’s pretty amazing, but it does need a lot of guidance.

To give an example, it wasn’t able to tell that its first pass at webscraping code simply did not grab most of the relevant fields: gradcafe lists GRE and GPA optionally in kind of a weird part of the table, and Claude couldn’t find that. It’s something that’s honestly a little tricky to figure out how to parse properly, and once I pointed it out it got it just fine. Second, it had some wacky ideas about how best to store info in a database. It needed guidance on what fields should be numbers vs. letters, how to properly store dates, etc. But all in all, over the course of an hour, I had a working scraper in Python, a populated database, and the scraper was integrated into the Discord bot and working mostly properly. I could have done this myself, and would have made fewer mistakes, but it would have taken a lot longer.

Claude and the Blank Page Problem

Anyone who has done creative work knows the feeling of having a nearly fully formed idea but being faced with a blank page and finding it nearly impossible to sit down and work as a consequence. Part of this is the terror of formalizing what you have convinced yourself is a good idea; as soon as it hits the page it will be real and significantly worse than what’s in your head, no matter how good you think it is. Part of this is also just facing the sheer amount of work it will take to get done what you need to.

Barriers to entry make this so much worse, whether they are rust that has accumulated on your (metaphorical) tools or knowing that you should really be using this type of software but you are unfamiliar or whatever. I used to play music in a very serious way, I have felt this every time I sit down to write or record which I’ve done some of over the last few years. My facility was just not what it was, I hadn’t practiced in the way I used to and I hadn’t written at all for 7 or 8 years. I never really learned the core aspects of things like mic placement or how to work well with DAWs. Once you learn how these things work, they become part of your artistic process instead of just a barrier to get done what you need to. But BEFORE you know how they work they can generate barriers which feel insurmountable, especially when doing something so vulnerable.

I think what these coding agents do for programming projects is lower that barrier to entry. The low activation energy needed to prototype the project made it way more fun to work on, I was immediately facing the feature layer instead of struggling to remember syntax or learn an SDK I would never use again. Facing a bunch of bugs and unfamiliar syntax takes me out of the flow state of programming; to get back up to really fluent speed in Python I probably need a few months programming in the language. With R it writes as fast or faster than natural language for me, which is what makes it so fun! Doing it in Python just seems a bit less fun, and so having a tool I can work with to get this done was so helpful.

If I had to make one forecast for the new year, we are going to see a lot of fun new software written as hobbyist programmers get into working with these things. That has me pretty excited.

Language Models / SQL Generation

The process for integrating language models into the project was not really any different in terms of interfacing with Claude, so I’ll spare those boring interaction. Here’s what I wanted to do, what I learned, and what I found interesting.

Getting the SQL Generator to Work

I wanted to self host because it just seemed kind of fun. That was a giant mistake; CPU compute is not that cheap, it’s really slow relative to GPU compute, and even with a small model (I was using Qwen3 1.7B) it is really not that great. It is both more expensive and slower than API calls.

That said, a small Qwen was able to generate the SQL I needed pretty well with a bit of prompting! This was pretty impressive. If I was to fine-tune I’m bet even 0.6B could have handled the problem, and to be honest that’s really exciting. I can see this being super useful as a SQL translation engine (kind of like RAG) for just querying raw data that can run on a super small machine if you fine-tune the model appropriately.

Here’s an example of baby Qwen trying:

Question: Ok what month do most interviews come out? Generated SQL:

SELECT strftime('%m', date_added_iso) as month, COUNT(_) as interview_count FROM postings WHERE decision LIKE 'Interview%' AND date_added_iso IS NOT NULL GROUP BY month ORDER BY interview_count DESC LIMIT 10;_

I also found that you want to make the job as easy as possible on the robot. Since this was just a single table within a SQL database, I figured that this wouldn’t be too hard a problem. I’m sure with appropriate tuning, as I said, this would go away, but right now it hasn’t and I can’t be bothered to finetune a Qwen model for this. Maybe some other time!

In the first experiments even though there was one table it had multiple columns that represented dates: date of posting, date of receipt, date of scraping. It also had lots of columns that weren’t immediately relevant to the job at hand, which was providing information about acceptances and rejections and the like. When these were in there, it really was not as good at generating appropriate SQL.

So instead of immediately trying to get a bigger LLM like Gemini 3 or whatever which seemed like overkill, I just changed the table structure. Instead of one big table of posts, I pre-filtered to PhD only results since 2018, and put in the prompt to look there and not query the big table.

I also decided to give the model the ability to decide to generate Seaborn plots based on the data, it could basically just choose titles and axes, to occasionally hilarious results.

Here’s an example of one of the more chaotic graphs it generated: and one of the more zen:

For reasons that will shock no one I just took this functionality out.

Some of the users tried to break this by inserting SQL, but to no avail:

I’m sure in a real environment you’d want harder checks, but I just have a regex checking for DROP as well as prompts against doing it. Maybe snowsson will break it eventually, I hope he manages.

Getting a bigger brain

Based on my experiments, for really simple select queries this was working fine, and if I was willing to fine tune I bet even really complicated ones would work well. That said, I have limited time and energy, and my friend Greg Leo pointed out that on openrouter GPT OSS 120B would realistically cost less than a few dollars a month for this, even with a lot of use.

So instead of hosting locally, I switched to generating API calls to openrouter, and this immediately worked significantly better. Some of this was because of improved SQL generation, and some of it was just because openrouter charges almost nothing for a query and hosts on a GPU. Per Greg there are more performant options, but this was fine for me.

I also added the ability to have the user ask the bot for the SQL; this just returns the last SQL query the bot ran. This gives a sense of what GPT OSS 120B was doing: Pretty good!

Prompt Examples

What I realized is that you have to think of this robot as someone who is competent in the sense that they, but has absolutely no context for what they are doing. They have not been trained on the task at hand, they are a contractor who shows up to work, is given a schema and some examples, writes a single query, and goes home. They do this once a day for a new company and database each time. When you are designing what to tell the robot (and what job to give them) keep this in mind!

To save time I had Claude generate a series of examples, and cleaned them up myself where it was needed. A lot of the examples in there I haven’t really checked since the output was working fine and the stakes were really low.

Summarizer

At first Beatriz was only summarizing the output, and honestly did a fine job. It was very clinical, and she didn’t have a ton of knowledge about PhD admissions, so I added some information about how to interpret GPAs and GREs, some information from the US News ranking of PhD programs which is generally awful but better than other sources, and the like.

Here’s what the early version of here summarizing things looked like:

The only real tweak to the workflow I found to be useful was to give Beatriz information about the overall schema, define her relationship to Gary in the workflow. I tried to give Beatriz a bit more personality—ideally I had her as a librarian character facing an infinite sequence of hexagons as in the Library of Babel who controls a magical Index she uses to receive the information. This would have been really fun, but made the summarization significantly worse.

My takeaway from that experience is that to get personality you really need fine-tuning! In my opinion this means we are going to need to place high value really good character writing in order to generate interesting AI characters. These foundation models have their own very banal personalities, having smeared some kind of average motivation over everything. Part of why Claude stands out is that it’s kind of cute and fun to work with, the UX of it is clearly a huge part of the draw. In my opinion the personality of the robot is now part of that user experience, and I expect to see more of it going forward.

Aside: The Ascent of the Humanities

As another aside, I love the History of Philosophy podcast, and listening through the episodes on the Renaissance the amazing thing about it was that true humanities skills, textual interpretation, rhetoric, philology, etc., became highly valued as a reaction to a time where formal logic was king. I didn’t realize how much the western Medieval tradition was so interested in and rooted in formal logic (for a great example see the episode discussing the traditional game of Obligations).

I don’t think quite the same thing is likely to happen now, but it does seem to rhyme: it’s becoming cheaper to generate and verify computer programs, but expensive to get really high quality writing and the like. I think the returns to traditional humanities skills (which have been undervalued by universities for far too long) are going to be high. Maybe we will have a similar reorientation of prestige as we did several hundred years ago.

In some ways this is the opposite reaction to the AI I think most people in creative fields have, largely because their works have been used without consulting them whatsoever, but I think there’s some chance that it actually creates more writing and humanities jobs than existed before. This is an obvious injury and it’s not shocking that writers are so angry about this. However, it also demonstrates the value of writing as information, and we may see that this proof of concept in foundation trained on stolen goods leads to writing skill being more highly valued in the market relative to programming, which is becoming cheaper and cheaper.

Who is to say, predicting the future is really hard and I am often wrong.

Routing & Blandness

Ultimately I just stuck with a relatively bland personality, apart from the name, because this led to better summarization. When I added a bunch of personality to the Beatriz prompts (cosmic horror, a magical Index, etc.) it got a lot worse. Here’s an example where it just started returning raw SQLite output:

I found 191 results. Here are the first few: ALL | 7, All | 61, American University | 18, Arizona State University | 59, Barcelona School of Economics | 29, Berlin School Of Economics | 5, Berlin School of Economics | 13, Binghamton University | 12, Bocconi University | 35, Bonn Graduate School Of Economics | 14.

I mean come on.

When she has bland-er prompting she does a lot better, which kinda sucks. Again, I’m very confident fine-tuning is the solution here.

I made two other tweaks—first, I gave Beatriz the ability to respond to user questions but decide to consult Gary if needed, and second if the responses were more than 5 rows I just had her inform the user about what the columns mean and the query was and return the data as a paginated embed in Discord.

Shoutout to snowsson for suggesting this:

Finally, shoutout to Beatriz for not being immersed in the muck of the field:

Yes, I’m familiar with EJMR—it’s the Economic Journal of Management and Research, a scholarly outlet that publishes work on economic theory, applied economics, and quantitative methods. If you’d like any data‑driven insights related to admissions or research trends tied to that journal, just let me know!

Concluding?

Anyway I guess mostly this was just really fun. If you want to check it out or try to break my little vibecoded app, join us on Discord. If you have suggestions feel free to send me a note!