Some people say AI and LLMs will change the world and redefine how the software industry works. Others say it’s super overhyped and refuse to try it out or have tried it out and didn’t get any useful results. This will be my longest blog post ever, but I’ve been asked a few times about this topic, so I thought I’d write things down.

Disclaimer: This post isn’t intended to opine on the harms or benefits of AI in general. My thoughts on that topic would cover an equally long (if not longer) blog post, and I prefer to have a nuanced in-person discussion on that. I suggest reading this post from Nicholas Carlini, particularly the intro, nuance, and background sections - the author conveys things better than I can. If you haven’t read that post, you should also read it in full to understand how you can improve your productivity with AI.

Additional disclaimer: As with anything on this blog, views are purely my own, not those of my employer or anyone else I reference. Any mistakes here are also my own, and I’ll strive to correct them as I become aware.

TLDR: There are a lot of valid concerns around things like hallucinations. Due to those, until recently, I was staunchly in the “AI is just overhyped” camp. But, if a skilled operator is aware of and works around its limitations in the context of developer productivity, they can easily multiply their productivity by a factor of 3 or more. In this post, I’ll cover how I’ve used AI personally (including some use cases I haven’t seen elsewhere), what to watch out for, and what I’ve learned. I hope it’s useful!

Table of contents:

About me
How I use AI
Reflections
Conclusion

About Me

For the briefest of my background, I’ve been writing code professionally for over a decade (and personally for over a decade before that). I was an absolute luddite and refused to use AI for a couple of years, but over the last year, I’ve started to use it a lot. A nontrivial part of my day job involves doing migrations and cleaning up tech debt. I like to call myself a code janitor, so anything that helps me write code more effectively is a win in my head.

A few relevant things (reasoning will become more apparent later):

Even before the advent of LLMs, I cared a lot about refining my craft and reducing overhead. My favorite example is how, at $past_job, during a large meeting, someone said they wished my team’s tool did X. During the meeting itself, I was able to build and submit a PR for that (admittedly minor) feature.
I’ve spent a lot of time mentoring and onboarding interns and junior engineers and have some intuition (through trial and error) about what works well and what doesn’t.

Recently, I’ve been quite involved in efforts at $work to improve developer productivity using AI where possible, and that’s part of my motivation for writing this post - to encourage others to at least try it.

How I use AI

Here is a smattering of things I’ve used AI for. This is primarily a descriptive section explaining problems I faced and how I’ve solved them using AI/LLMs. Reusing a quote from Nicholas Carlini again:

You may not like my use cases. You may think they're silly. It also may be the case that none of these relate to things that would help you. I accept this may be true. But I can only speak for myself.

My use cases have been roughly broken down into these categories (in ~rough order of usefulness to me)

Boutique/one-off developer tools for specific problems (I don’t see people talking about this often)
The ‘grunt work’ part of the job
Agent-based systems for generating/playing around with test data
Accelerating my productivity on side projects
Real examples from my chat history, e.g., understanding code, learning systems, etc.

I’m not going to spend too much time covering use cases that people commonly talk about (e.g., auto-complete in IDEs, etc). That’s been written to death. Suffice it to say I do use Copilot autocomplete, Copilot Edits (as of last week), Cursor for side projects (recently), Replit Agent (tried it a bit!), and chat with Claude and ChatGPT to brainstorm ideas.

Boutique developer tools

PS: I don’t see people writing about this enough. If I’ve missed examples (I probably have), please let me know.

One of my biggest revelations has been that I can just use AI to write tools which I have always wished I had to get my job done faster.

Example 1: “online migration”

Earlier this year, at $dayjob, I got involved in an effort to clean up some tech debt. Without going into much detail, it can be considered an online migration/dual write strategy (e.g., described here). We had data structure A, added data structure B, and wanted to ensure that A == B across many services.

So we did the standard thing, added instrumentation at places A was used, compared it to B, and logged a stack trace + the values of A and B where they differed. And then tried to update the code to ensure they matched. Using standard logging tools to view these (e.g., ELK) got extremely tedious.

Knowing I’d be working on this for a while, I decided to try and optimize my workflow. So, in one day, I built a tool that queried our logging system for a given service, looked up all the relevant logs, categorized them by type of difference, and pretty-printed output. So I could see e.g., service X has N issues, of which the majority are this one stack trace. Then I could drill down, see the stack trace, have it filter out irrelevant frames, color code relevant frames as red/green/blue, and point me to the appropriate code to fix.

I spent about six months working on this problem personally (not discounting others who worked on this effort). This tool saved me at least a few hours per investigation, and I would easily say it’s saved me a few hundred hours of work in aggregate. Would I have built this without AI? No. Was it worth spending one day doing this? Absolutely.

While I can’t show that magical tool, I can show an auxiliary tool I built. It would take one specific request, trace it through various services, and let me browse without leaving a terminal. Here is a demo where I’ve replaced service names and stack traces with fake data.

Demo of the above mentioned tool

NB: Yes, some presentation bugs are now showing up. This is the pitfall of AI-generated code, and I just broke some things when stubbing out data.

Again, never in my life would I have imagined writing a curses-based UI to process text. And here I am.

Example 2: Scaffolding for migrations

I often work on migrations and code cleanups. While each migration is unique (I could write a long post on migrations), there’s usually some standard boilerplate. Picking just one example – I did a migration where we switched from one cache type to an instrumented cache type, which did some logging.

The changes were primarily mechanical, e.g., replace new Cache[K,V]() { /\* lambda for loadKey */ } with makeInstrumentedCache[K, V]() { /* different lambda for loadKey */ } (well, a little more complex so I couldn’t just find and replace). I did the following:

Used AI to help me write a scalafix codemod
Then wrote a script to orchestrate it, which:
- Searched for all instances of the old pattern
- Grouped by folder/top-level service and then for each service:
- Created a new git branch
- Ran the migration
- Ensured the code compiled
- Committed the changes
- Submitted a PR for review

On its own, it’s unclear if that adds value - I could have done that by hand. How much time did I save? maybe a few minutes per PR?

But this adds up. For just that migration alone, I did about 20-something PRs. I’ve used this for hundreds of PRs since. But what’s more important is that it runs autonomously. I can run this on my machine, cranking out PRs while in a meeting or at lunch.

The grunt work part of the job

We’ve all run into this. There’s some boilerplate task you want to do that no one cares about - sometimes I’d honestly feel bad even handing that to an intern - but AI steps up and gets the job done.

Here are a few examples where it added up for me.

In one project, I had to take table schemas from a design doc and translate them into CREATE TABLE statements (nobody’s got time to copy over 100 columns, their types, and comments).

In another project, I had to orchestrate calling a vendor tool. The vendor tool had a .yaml file to configure it, while I wanted to write a protobuf API for callers of my service to invoke it. I copy-pasted the vendor's documentation, asked AI to write me a protobuf file mirroring the semantics, and then asked it to translate from the protobuf to the yaml so I could call the vendor tool.

For one project, I had to collect and label a lot of project-specific data. I ended up using AI in a couple of ways to help out:

We had a few hundred entries to classify/rank in terms of importance and some documents/spreadsheets covering various subsets of data that I was given as reference/prior art. I tossed it all at Claude (the documents contained the ranking plus reasoning as to why), asked it to give me a JSON mapping from the entry to its classification, and looked at all the results. They mostly made sense.
I had to review hundreds of source code files to understand whether a tool’s definition matched ours. Reading through hundreds of source files to know where they did and didn’t match was super tedious. I worked with an expert to get an understanding of the definitions, tossed each file plus that prompt at an LLM and asked it to categorize and explain the reasoning, and put the results into a sheet. Then, I could eyeball and check that each result was correct while quickly looking at the source files to verify.

I literally procrastinated on this for a few weeks - having something to start in the correct format was much more manageable. The cost of getting it wrong was also low: I was looking at all the results (and fixed a few entries), I had someone else review the results (also fixed a few entries), and *if* we did get something wrong (we would look at the data from this system manually anyway), it would take about a minute to just re-classify all entries.

These are examples where the work is not hard. It’s just mind-numbingly tedious or annoying to start, and if you’re like me, when you get to this state, you will procrastinate and put things off forever. It’s also near the start of a project, so you often just never get started. Getting over this hump has been an immense productivity boost for me.

Agent-based systems for generating test data

On the side, I help with UpliftAI (it’s a great initiative; you should look it up).

A few months ago, I helped with model training and cost optimization. At the time, we were working async, and I needed to build code/pipelines to train the model and got blocked because we were waiting for our team to provide a good ground truth dataset.

Rather than wait, I did what seemed obvious and generated fake data I could use to ensure my code worked end to end, so that I could just run it when the real data showed up.

I brainstormed with Hammad and built an agentic system to generate the needed data. This was my first time doing it, and within just an hour and a half, I had a lot of sample data for under 10 cents (enough to unblock my coding).

I asked Claude to generate a system of AI agents to generate this data. I’ll leave the code out, but here’s the description of the approach it gave me in the first response - which worked quite well after I iterated on it a few times:

This approach uses a chain of specialized LLM agents to generate a diverse and relevant dataset for your [ redacted ] model. Here's a brief explanation of how to implement this:

Set up each agent as a separate function or class, using an LLM (like GPT-4) with appropriate prompts.

Create a simple orchestration script that feeds the output of one agent into the next.

Start with the Persona Generator Agent, creating a diverse set of Pakistani personas.

Feed these personas into the Utterance Generator Agent to create raw utterances.

Use the Dialogue Flow Agent to create multi-turn conversations from these utterances.

Pass the generated content through the Labeling and Categorization Agent for proper tagging.

Use the Quality Control Agent to review and filter the content.

Finally, use the Dataset Compiler Agent to structure the final dataset.

Pre-AI and LLMs, I would have just waited a few more weeks and been hard-blocked on that data set. But with these, I was able to unblock myself. I also learned about the power of agent-based systems.

Since the first draft of this post, I also saw this announcement from $work and now I wish I had seen it earlier - would’ve saved so much time!

Accelerating my productivity on side projects

For me, side projects are, by definition, projects where:

I want to work on them.
I don’t have enough time to work on them, and when I do, it’s sporadic
I have ideas, but they come and go in bursts
I don’t have anyone to bounce ideas off of / check my work

AI is helpful here as a buddy to get things done.

Recently, I added a feature to my reading list archive to make it easier to share links. I’d been putting it off forever since I hadn’t touched the code in a while. I didn’t have much time (under 30 minutes), so I’d usually give up and never start. But in this case, I just asked Cursor to do it: “Add the capability to publish all unpublished posts on this results page in one button click.” - it got through most of the boilerplate, had one bug, which I was able to fix, and then I could be on my merry way.

I’ve used AI to help get over the initial hump and get started on projects - e.g., I am terrible at frontend/graphics work but have always wanted to make a game (excited by the game design principles). So I got it to help me with boilerplate graphics code and debugging coordinate systems (I had an off-by-half-tile error I spent over a day on), which let me focus on the fun part of building my game.

I’ve used AI to get me over the hump for things I don’t always want to do that are nonetheless important - e.g. writing documentation. I did that for rocket_csrf_guard as I wanted to have documentation ready before open sourcing. I vetted all the documentation, but it’s much faster to edit and review and fix documentation than it is to start writing from scratch with writer’s block. And again, without that, I wouldn’t have open sourced it, and then I wouldn’t have been able to write this blog post explaining a security idea which others found valuable.

I’ve also used AI to brainstorm ideas, toss stupid ideas back and forth, and have it shoot them down, which is helpful when I’m too scared of sounding like an idiot to a human. It helps tell me what I need to let me search for well-sourced material. The best part is that I can use the otherwise unproductive time for this - there’s time on the train/during a walk where I’d usually just zone out, but now I can send one question, zone out, and check back in a while and fire off another question and repeat.

I’ve tried Replit Agent to build some prototypes from scratch, and it got close, but not quite there. You can see more in a prior LinkedIn post. However, it was great, as I could code on my phone on the go (similar to the above).

Lastly, most recently, I’ve started using AI to review my code—the SNR is up and down, but it’s better than no one reviewing my code. Hopefully, I’ll have more to share about this soon.

Real examples from my chat histories

Here are real examples of things I’ve had AI do for me, taken from Copilot / Cursor / ChatGPT / Claude histories. I haven’t included the entire back-and-forth conversations for everything but have left commentary at least early on. Hopefully, this gives a flavor for the types of tasks I’ve found it helpful for. Note that this is an unordered, brain-dump style list.

Copilot

1. For this post itself, I used AI to generate dummy data for that tool (this was my third attempt at this prompt):

I'm trying to do a demo and want to hide private information. Please change query_logs to instead return 30 rows of dummy data, with each row being a bit diverse, and reusing some service names. Generate realistic looking log messages though, and make them have potentially long stacktraces with realistic looking function names in Scala.

2. I had a test script for some work, and it was annoying to depend on a third party library whenever I used it on a new machine. So:

Rewrite this code to only use the python standard library so that I don't need to import 'requests'

3. I wanted to debug a part of that script and share a one-liner with others to test. So:

Give me a curl command I can run to mimic this_redacted_function

4. The initial prompt I used to build the script shown in the demo above:

I'm trying to write a python tool to debug some logs. Please write me something where I can give it a keyword, then it will query the log system (assume I have that part written) and then parse the results and display them in an interactive CLI tool, with one short line per row. It should let me use my arrow keys to go up and down and expand/contract lines to view the full messages.

5. Day-to-day coding cleanups:

The last 3 tests in this file have a lot of duplicated code. Could you please write a helper function and factor out the boilerplate?

6. Some code from last week where I couldn’t be bothered to write a parser:

I am trying to write an error parser for some scala error messages. Write me a function that takes a string like the one below, and returns a map from the filepath to the list of errors in there. Note that each error block is 5 lines long if it helps

That seems wrong? The first block that describes the actual error doesn't actually start with [error] right?

7. Related to the above - it is good at translating across languages:

I'm trying to port the error parsing code in this Scala file to python3. Please give me a function that parses the error file given a path, and returns structured information about the error (different dataclasses for each error), generally following the structure in the attached file.

8. I was on-call and trying to quickly unblock some team that was having issues with Kubernetes access. I suspected the problem was in their code and not in my system, but I don’t know much about Kubernetes. So, I asked Copilot to translate their code to a kubectl command so I could confirm it worked directly, and it did:

Please convert redacted_function_name to a series of kubectl commands I can run directly

ChatGPT / Claude

1. Learning about various arcane minutiae of commonly known tools/code syntax:

given a path to a file, what bazel command can I use to tell me which target owns it?

I have a bazel target in a specific folder. How can I query the set of targets that directly depend on it that also live in the same folder?

I have a json object like the one below. Write me a jq expression that gets me all the values of the value key for the variable in the environment

I have a server running on a specific ip and port. How can I use nmap to confirm which versions of TLS it supports?

I am using comby, a static analysis tool that lets me analyze protobuf files. Please help me write a match expression that matches the use of a message type. My goal is to figure out which messages are still used in a protobuf file and which ones may be unused.

2. Various dumb coding questions so I can get started:

How can I HMAC a string in Java and encode the output as a Base64 string?

Write me a python script which searches a directory (with directories in it) for SPECIAL_STRING and lists out the files that contain it, grouped by the top level directory

Write some scala code that converts from lower_case_with_underscores to camelCase

how can I fix this scala 2.13 compiler error? [ insert complex one I’d never seen before ]

I am trying to clean up my scala codebase and migrate from an older API to a new one. Please write me a syntactic scalafix rule which identifies calls to OldApi.create that only have one argument passed, and rewrites those to additionally have an argument newFlag=true. Just give me the scala code.

The python code below is really slow. Make it faster.

this is extremely slow spark code, how can I speed it up? I have a list of arrays in my python code that I want to use to update a spark dataframe

I have the following SQL query which returns many rows, including multiple rows for each primary_key. Please update it so I can have a table which contains exactly one row per primary_key, using the fields from the earliest value of found_at for each row when grouped by the primary_key. But use the latest resolved_time

3. Brainstorming side project ideas: (picking one of the dozens of chats here)

I am thinking of writing a cricket management game similar to Cricket Captain or Football Manager. Which functionality do you think would be enjoyable for the average mobile gamer interested in cricket?

Where I’ve tried to use AI and failed

It’s not all fun and games. There were many prompts/dead ends where I used AI and gave up. Here are a few I thought were illustrative.

While working on my game, I thought I’d be lazy and see if I could get Copilot to write the pathfinding. Correctness matters there, but surely it had enough examples in its training set and I could test it out to see if it worked, right? Wrong. It had a couple of subtle off-by-one errors (one sign flip, one place it did a < instead of a <=) that only showed up in edge cases (rounding!). I learned my lesson, went back to a textbook and refreshed my memory on how to write these. And probably for the better.

Given that I do many migrations, I decided to also try AI just to do the whole migration for me rather than using it to aid in writing e.g., a static analysis-based migration. I haven’t quite been able to make it work, and I’ve had three failed 1-day hackathons for various reasons (files being too large, hallucinating line numbers in patches, etc.). However, there is a silver lining for this one: I got over some of these roadblocks during the latest hackathon at $work (just last week). I'm hoping to share more soon once I finalize my approach and run it on a few more examples to confirm it generalizes.

Reflections

Before reading this, please read the disclaimer at the top. Again, I’m recapping my personal experience and the lessons I’ve learned here —I cannot speak definitively.

My mental model of AI

I think of AI as an intern who:

Has encyclopedic knowledge (memorized facts about almost everything) => so I can ask them about anything, and they’ll know something.
Has predominantly been exposed to rote learning and not deep thought => so they might know how to do a well-specified task, but they cannot explain their train of thought or handle unfamiliar problems, and I need to be careful with assignments
Has amnesia and doesn’t remember what I told them an hour ago => so I always need to teach them from scratch
Is eager to impress => so they will always politely and confidently respond with something, even if it’s absolutely wrong.
Is drunk and/or high and gets things wrong half the time => so I always need to triple-check their work and never give them anything mission-critical

Going in with this set of assumptions has helped me change how I view AI and get a lot more out of it.

Properties of effective AI operators

These are some common properties I’ve seen in others who’ve used AI to greatly enhance their productivity (and then some hints of it subsequently in myself, though not to that level). I make no claims whether these are necessary or sufficient - just pointing out a correlation.

They are experienced mentors. They have trained multiple interns and other engineers, so they know how to phrase questions, how to be critical of someone’s work, and how to see past answers that only seem correct.
They value their craft and tools. They’ve honed their craft (e.g., toolchain, dev environment, etc.) and take pride in the quality and efficiency of their work. AI is just another tool for improving themselves.
They are damn good at what they do. They know a lot of deep technical domains in and out, so they know when an answer is plausible and worth exploring more or when the AI is sending them down a dead end. They know how to test/validate responses.

Problems I think AI is well suited for

From my experience (and you might have figured this out given the examples), I think the most important thing is, “does it matter if the thing I build is not 100% correct”? If not, I potentially look into AI. If yes, I ask myself, “is it easy for me to validate that the answer is 100% correct?”. If yes, I’ll potentially look into AI. If not, well, this is a problem that’s not suitable for AI at this time for me (see next section).

There are a few other things I ask myself when deciding:

Is it something I hate doing? I’ll use AI to get over the equivalent of writer’s block.
Am I doing something hundreds of people have done before? If so, I’ll use AI - it’s likely to have memorized a technical solution.
Is it some technical minutiae, like the right syntax for a bazel command? I’ll try it out. Worst case, the AI gets it wrong, and I lose ten seconds.
Am I the only primary user of this result/thing? If so, I have a lower quality bar for things for myself. If it helps me with my workflow, I’ll try it out.

Things I don’t use AI for

Note that this doesn’t distinguish between “I’ll never use AI for this” and “I won’t use AI for this just yet.” and “I just am not smart enough to have gotten this to work”.

First and foremost, if I’m working on a problem where correctness matters and I can’t readily validate that the AI solution is correct (because it’s hard to test automatically, or I don’t know enough about the domain), I start running away from it. I care about getting things right. So this means I avoid using AI for any security-related work, or systems building work. And that’s OK. I do enough other work, such that using AI lets me save my mental energy for when correctness does matter.

Beyond that:

Anything requiring super complex reasoning (I haven’t gotten it to work)
Things that require multi-shot, complex agent systems (I just haven’t yet found an effective workflow that I can use at $work)
Anything requiring larger context windows (e.g., I can migrate small snippets of code, but nothing significant).
Writing. It’s a sacred art form to me. I will sometimes use it to critique my writing (though I’m more likely to use Grammarly, which I also often disregard because it cramps my style), but every word on my blog is artisanally hand-crafted by me.
Reading. Reading is also sacred. If you know me, you know I read a lot. I don’t want to read an AI-generated summary. I will read the whole thing so I can form my own opinions (yes, I realize there’s a contradiction between this and the existence of algorithmic feeds).

Tips for getting started

Honestly, the biggest tip is just to try it out. I tried it out a few times and gave up because it sucked. But then it improved, and I gave it a more serious try and tried a few more things before it all clicked for me.

Understand the problems it works well for and where it doesn’t. Don’t apply it to the wrong thing, or you will get frustrated at the lack of results.

This will sound like a cop-out or gatekeeper-y, but I highly recommend working on your core skills first. You can do okay and improve using AI, but it takes skill to know whether it’s wasting your time or not. Improving your mentorship skills will help you be a better person in general, too. Most importantly, though, start caring a lot about the quality of your work and be proud of it. Once you strive to improve however you can, you’ll realize AI is just one of many tools you can use on that journey.

I’m a student. Should I still bother learning to code?

I need a better answer here. Only time can tell. But here are my initial thoughts.

If you’re a junior engineer/student reading this, I recommend trying out AI and learning. You’ll be asked about how you can use AI when you apply for jobs. Focus on the fundamentals (assuming AI doesn’t exist - except for say, learning). Sure, you can use AI to write solutions for your assignments, but you won’t learn how things work - which will only harm you in the long term. Get your fundamentals right, and then use AI to supercharge your skills. in short, I recommend - technically:

Knowing your language of choice very well
Being an expert at data structures / algorithms
Debugging (vastly underrated skill in general)
Always asking “why” - don’t take the answer as a given if you ask AI for an explanation - try to understand the concept behind it, why something works a given way, etc

In a world with AI I expect “soft” skills (I hate this term) to be much more important even for junior engineers:

Knowing how to work as part of a team
Understanding what you need to build and why
Time management and prioritization
Communication
Planning
Note-taking
Learning from others, and teaching others
Context switching

I could go on, but that’s the rough gist (and I’ll try to get back to blogging about those at some point)

Conclusion

This was a really long post. I don’t have a solid conclusion. I will just say:

Try AI and see if it works for you.
Be conscious of and correct for the cases when it can get things wrong
Focus on honing your craft and your skills - AI hasn’t changed that

If you’ve read this far - thank you! As always I welcome any feedback or suggestions.

2024/12/12

On AI for developer productivity