honnibal.dev

Style tips for less experienced developers coding with AI

2026-02-05 · 15 minute read

If you’re not careful about how you write software, the complexity of the system will grow exponentially with its functionality. Most of the accumulated art of software engineering is about stopping that from happening. The lessons are mostly the same no matter how the code was authored. If you’re finding that you can use tools like Claude Code to get smaller units that seem to work, but as you build larger things you seem to get stuck, I hope this post can help. The most concrete tips relate to Python, but some of the concepts are the same across languages.

Before I get to the specifics, there’s a very important mindset thing that’s behind a lot of it. You should really have a sort of “slow and steady wins the race” philosophy. This is true of all sorts of building projects. Imagine you hire someone to pave your driveway or build an extension on your house, and it’s some young guy who’s running back and forth, measuring things hastily, keeping things in his head to save time on writing them down… These are not good signs. An experienced professional will take lots of little trade-offs that slow things down slightly in order to reduce the risk of fucking it up entirely.

Software is very much like this. A lot of inexperienced programmers have absorbed the slogan, “move fast and break things”, but it’s important to put that in context. You don’t stick a slogan like that up on the wall if it’s the collective wisdom about software engineering. The average expected effect of any corporate slogan is obviously zero, so the idea was to over-correct and phrase it as provocatively as possible. It’s disastrously bad advice in itself.

Tools like Claude Code give you speed in abundance, so a “slow and steady” mindset is even more valuable. You shouldn’t be worried about taking a bit longer to do things. If the LLM can help you write almost any small piece of code in seconds, how can you possibly lose? Quite easily actually! You don’t get anywhere by writing the wrong thing quickly, so you can get stuck in a loop where you just bounce between failures.

The “slow and steady” principle suggests a clear north star to sail towards. We want a codebase that looks like what an expert team of developers would produce. Over the years the industry has learned things about building software projects that can keep growing. Today’s LLMs aren’t trained with much reinforcement learning signal that rewards them for decisions that pay off over several days of work on the same project, so it’s up to you to keep them on track. Here’s what you should look for.

Use types, as strictly as possible

LLMs often have trouble correctly propagating changes. Humans aren’t perfect at this either, but when LLMs get this wrong they can easily enter a failure spiral. Type checking catches many many problems, it’s overwhelmingly worth it with LLMs. In general, generative AI thrives on one-way functions: situations where it’s easy to check output for correctness. Type all the things.

Use Pyright, and enable type-linting in your IDE

Make sure the LLM is actually running the type checker. I don’t have this on a hook because it’s not important to get errors in the middle of drafting something, but it should always check the types before running the tests.

I suggest using Pyright rather than mypy. It’s faster, which gives it better IDE integration, and it has slightly better rules to automatically narrow types. Pyright also behaves better with respect to external libraries that don’t have comprehensive type annotations.

You need to be able to see type errors underlined in your IDE. This can be a chore to set up, but it’s essential.

Always ask “should the types have caught this?”

Pay close attention to the problems that seem to occur. Always ask whether you’d expect the types to identify the issue. If the LLM proposes a diagnosis of a problem that would result in a type error, point that out.

It’s very common to have big holes in the type specification without realising it, because Python is gradually typed. A missing type annotation is assumed to be Any. If you stay vigilant as the LLM debugs, you have a much better chance of noticing these things and getting them fixed.

Make sure attributes are being annotated

This is a very common oversight in Python code, and LLMs have learned the mistake. The correct syntax looks like this:

class Foo:
bar: str
def __init__(self, bar: str) -> None:
self.bar = bar

The annotation for bar goes in the main scope of Foo. If this is missing, an invalid assignment to foo.bar won’t be identified by the type checker. This sucks especially because the LLM is often making those assignments as side-effects, which is the sort of thing that often doesn’t get tested properly in unit tests (more on that later).

Use dataclasses and Pydantic models

Claude Code loves to pass data around as dictionaries, with fields accessed via string literals. If you’re writing code manually it’s sometimes worth it to have stuff like record["name"] because it can be more concise, but with an LLM it’s pretty much never a good trade-off. You always want a Record dataclass or Pydantic model, so you can have record.name instead.

If you let the LLM get away with stuff like record["name"], you’ll often end up with different functions disagreeing about how that data should look, due to poorly propagated changes. It will add or change part of the record in one place, and then not update the other functions that are using that data. Then when it goes to debug the problem later, it won’t know which one should be correct, and there’s a high chance it will suggest an incorrect fix. For instance, if you have a function that’s receiving two versions of the record dict, the LLM will often try to support both, instead of going back and fixing the inconsistency.

The other thing about an access like record["name"] is that there’s usually no type annotation for the value. Look out for type annotations on variable assignments within your function. This is often an indication that something’s wrong. If you’ve specified your input types well, and you’re always using functions with good return-type annotations, you shouldn’t need many annotations on your assignment statements.

Scrutinise unions and optionals

Whenever the LLM declares that some value could be a None, ask “Can it though? Why?“. It’s absolutely normal to have a value that can be str | None, but there should be some sort of reason for why that’s the case. Don’t let the LLM make things optional just because.

The same is true for return types. If you can, it’s always good to specify an @overload that allows the return type to be inferred from the inputs:

from typing import Literal, overload
@overload
def do_something(x: dict, returns: Literal["str"]) -> str:
...
@overload
def do_something(x: dict, returns: Literal["int"]) -> int:
...
def do_something(x: dict, returns: Literal["str", "int"]) -> str | int:
if returns == "str":
...
elif returns == "int":
...
else:
raise ValueError(f"Unexpected value of 'returns': {returns}")

LLMs today are not very good at reasoning globally about your program and figuring out that you can narrow what some function should accept or return. Instead, when debugging, they often suggest broadening the function. This is very often a mistake. Whenever you see it proposing an edit that broadens a function, interrogate it closely about why that’s necessary, and make sure the reasoning adds up. Pay particular attention if it seems to be tunnel-visioned on just getting some test to pass.

Control complexity at the interface

Most programs have some interface where you receive some external unknown input — requests for a web service, files from the command line, environment variables, etc. Have a clear definition of what constitutes valid input, and if you need to do any conversions or handle any alternations, do that right away. If you’re dealing with json files that can be formatted slightly differently, fix that at input and only pass one type of data through the rest of your program.

If you have two inputs that have three variations each, that’s nine possible combinations for your functions to deal with. And this almost always gets worse over time. New functionality usually shows up in the interface of your program. The world changes around you, so you have to support something that didn’t exist before. Some of this complexity will inevitably leak into the rest of your program, because some differences will be meaningful and functions will need to be aware of them. But you have to do as much as possible to control it at the border.

Define and demand valid inputs

All functions in your program should know what constitutes valid input. These validity definitions, termed “invariants”, can be expressed over the inputs as a whole, not just individual variables. For instance, if your function takes two lists as input, it can say that the lists have to be of equal length. There are some annotation languages for expressing these invariants formally that I’d like to experiment with, but so far I’ve just been asking for them to be documented.

A function that’s passed invalid inputs must raise an exception. If your function appears to succeed even though its inputs are invalid, it has a bug. It’s fine to define a function where there’s some sort of do-nothing fallback – for instance, maybe you document that the function “updates this value located at these keys if they exist, does nothing otherwise”. Then the valid inputs are “collections of this type, potential keys of this type”, not “collections of this type and keys of this type, where all keys are present in the collection”.

Well-specified invariants encourage blame to be apportioned correctly when problems occur. Every bug hunt is a little whodunnit, where the culprit is the part or parts of the code that need modification. If your functions raise the alarm when they receive invalid input, the suspect will not be far from the scene of the crime. You’ll be able to look at the traceback and see which function issued the invalid call. What you don’t want is for your functions to cover up problems, and allow errors to flow through your program undetected.

When code is written and debugged manually, problems that allow errors to go undetected get found and fixed periodically. They’re setbacks, but each one usually only gets you once. Every so often someone will stumble into a bug that ends up being much more difficult than it should be to track down, because an error hasn’t been identified as early as it could have been. After hours, days, and very occasionally even weeks you eventually figure it out, at which point you denounce the whole conspiracy of behaviours that covered up the problem, and make the changes necessary. You and anyone else on the project resolve to do better next time.

Today’s LLMs are not great at these difficult bug hunts. Instead what often happens is the LLM will get the wrong guy. It’ll put some half-assed change in the wrong place, and now you’ll have a new problem to deal with. This is one of the ways LLMs can spiral into failure if you let them.

Crack down on exception handlers

LLMs currently litter your code with overly-broad exception handlers. This is very bad, for the same “don’t let errors flow through your program” reason described in the previous section. I wrote about principled try/except style in Python here.

You generally want exactly one statement under the try block, with except handlers for specific exceptions. If you need to handle arbitrary errors rather than crashing (for instance in a web service), you should do that as high in the stack as possible, for instance in the request handler. You don’t want to be thinking gotta-catch-‘em-all at all different layers of your program.

Prefer pure functions

A “pure” function is one which returns an output with no other changes to state. Examples of common “side-effects” of a function are things like modification of input variables, methods which modify the object’s attributes, changes to global variables, or system calls such as file operations, database writes etc.

Obviously things which need to write to the database need to write to the database, that’s fine. But on a purely stylistic level, generally steer towards returning values rather than modifying inputs. One of the main reasons is testing.

What we’re hoping to do is develop lots of smaller pieces and somehow satisfy ourselves that they all work individually. This is the running theme through most of my advice. You want to have a clear definition of what the function is supposed to do, and then you can somehow check that it does it (more on that in testing below).

What we really don’t want are ways to somehow assemble failure out of all these individual seeming successes. If our functions have lots of side-effects, it’s easy to miss interactions in testing. We ideally want to be testing each function in isolation, we don’t want to have to test a flow of calling function A, then function B, then function C.

Testing

Here’s an example of a good unit test for a pure function, using Pytest:

@pytest.mark.parametrize("a,b,expected", [(1, 2, 3), (2, 2, 4)])
def test_addition(a: int, b: int, expected: int) -> None:
computed = a + b
assert computed == expected

You ideally want each test to be run with a range of inputs, using some combination of fixtures and parametrize. The details of Pytest and how fixtures and parametrize interact are honestly a bit annoying — but it’s worth it. If you get it right you can be reasonably confident that the function does what it should do.

Line coverage is a pretty goofy metric for a manually-written codebase, but I actually think it’s worth it with an LLM. You can access this for Pytest with the pytest-cov plugin and passing the --cov argument. I like it because using the LLM I’m much more at risk of having whole areas of the code I’m not thinking about, and it doesn’t cost so much to keep the coverage high with LLM-generated code.

The main thing to keep in mind is that a test with full line coverage doesn’t somehow mean that your function doesn’t have bugs. If your function has a conditional, the line coverage will just tell you that some value went down each branch. If your function has no bugs, that means all valid values will take the correct branch. If there’s a bunch of valid values you didn’t test, and they would behave incorrectly, line coverage will give you no indication of that.

Coverage over input values is more important than line coverage, but it’s easier said than done. In the past I’ve experimented a bit with hypothesis, and I think it might be a nice thing in an LLM-driven project, but I haven’t tried it out yet.

Make sure you also include tests that check for behaviour over invalid inputs. You’ll never be able to test all things that shouldn’t happen, but LLMs are extremely prone to covering up exceptions, so these tests are very helpful. In my experience LLMs don’t generally generate these without being asked to.

In addition to unit tests, LLMs like to write themselves little execution scripts for end-to-end testing. I mostly let it do its thing on those, and don’t put effort into keeping them working, or running them automatically. Depending on your project, you might want to convert them into reusable integration tests as well.

Try to avoid workflow loops where you go off and do things, paste the error back into the LLM, and then go do whatever it tells you. I’ve seen this anti-pattern called a “reverse centaur”: you want to be the head of the human/horse hybrid, not the ass. Another way to put it is just, probably don’t let the LLM make you its bitch. If it has, notice that. Sometimes it’s fine, but you know, check in with yourself. It’s not a good long-term arrangement.

Summary

Before LLMs, every programming project was a long walk through the fog. Now you can drive. That’s great! But the fog’s thicker, and now instead of tripping over and picking yourself back up, you might crash into a tree. So slow down a bit. Make sure you’re headed in the right direction and driving safely.

Picture an experienced professional, doing just about anything. Are they rushed, or patient? People who are good at things work deliberately and methodically. I’m sure there’s some field of expertise where you just race about recklessly, but I can’t currently think of one, and it’s certainly not the norm.

There’s no way that the best method of programming things will be to prompt the LLM as optimistically as possible and just get out of its way because that will sometimes be the quickest way to get where you’re going, even though it will often not work and when it doesn’t work you’ll have just wasted a bunch of time and have to try again at random, unless of course you thought it worked so you went and deployed it and oops now everything is on fire. Just slow down a bit. Steer.