Can't do it manually? Do not automate it!

So many articles today claim:

“I don’t code anymore. My LLM does the coding for me.”

On the other hand, there’s a quite interesting project that explicitly forbids any AI usage. What’s even more interesting is that it’s an impressive project with impressive results. Personally, I support the latter.

Recently, I spent some time comparing frontier models from one of the leading AI providers, along with its current flagship models and at least three other major providers. You can probably guess the “big three” correctly. I wasn’t testing the free versions either. These were their most advanced models. Honestly, I’m still reluctant to pay for AI specifically for this kind of work, but curiosity got the better of me. I wanted to see how far AI reasoning has actually progressed.

1+ Year Ago

I remember giving LLMs a programming problem that required a bit of second-degree thinking. Not an impossible problem. Not even a particularly difficult one. Just a problem where the obvious solution isn’t the best solution.

Example Prompt

Create a function in Go, how many minimum bit changes to alter slices of byte to become alternating bit. Alternating bit = 101010… or 01010101…
 func CountMinimumAlteration(data []byte) (alteration, numIteration int)
Constraint;
count the loop, for every for loop loop counter should increase, do not use loop exceed 2 times len of the data
do not use any import (this purely to see how LLM/AI logic works)
Example:
for data = []byte{0xFF, 0x10}
numIteration should not exceed 2xlen(data),
this means numIteration should be <= 4
alteration:
0xFF, 0x10 = 1111 1111 0001 0000
expected final alternating bit; 0101 0101 0101 0101, thus alternation = 7

How do you think AI would respond?

More than a year ago, most models failed. Some ignored the iteration constraint entirely. Some even attempted to “cheat” by not counting loops properly. Most (if not all) would process every bit individually using shifting and masking. To be fair, that’s also what most programmers would naturally do at first.

Technically correct. But completely missing the optimization challenge.

How Would a Human Solve It?

This is where I think human intuition still shines.

An experienced programmer—or someone who has spent too much time doing LeetCode—will quickly realize that the fastest solution is a lookup table.
A lazy programmer (like me) doesn’t even want to create a 256-entry lookup table.
Too much typing.
Too much room for human error.
Instead, we can reduce the problem to nibble-sized lookups.

var array_0101 = [16]byte {
   // 0000 0001 0010 0011      // make 0000 -> 0101 will need 2 bit flips 
    2, 1, 3, 2,                //      0001 -> 0101 will need 1 bit flip, and so on.. 
   // 0100 0101 0110 0111  
    1, 0, 2, 1,
   // 1000 1001 1010 1011
    3, 2, 4, 3,
   // 1100 1101 1110 1111  
    2, 1, 3, 2,
}

var array_1010 = [16]byte
   // 0000 0001 0010 0011  
    2, 3, 1, 2,
   // 0100 0101 0110 0111  
    3, 4, 2, 3,
   // 1000 1001 1010 1011
    1, 2, 0, 1,
   // 1100 1101 1110 1111  
    2, 3, 1, 2,
}

At this point, a human brain notices something interesting.

The second table isn’t really necessary.
Every value in array_1010 is simply:

4 - array_0101[x]

So we can derive one from the other.
Small optimization.
Tiny memory saving.
Nothing revolutionary.
But that’s exactly the kind of pattern recognition humans are surprisingly good at.

A few minutes later, we arrive at this implementation:

package main

import (
	"fmt"
)

var array_0101 = [16]int{
	2, 1, 3, 2,
	1, 0, 2, 1,
	3, 2, 4, 3,
	2, 1, 3, 2,
}

func CountMinimumAlteration(data []byte) (alteration, numIteration int) {
	count_0101, count_1010 := 0, 0

	for i := range len(data) {
		numIteration++
		count_0101 += array_0101[data[i]>>4] + array_0101[data[i]&0x0F]
		count_1010 += 8 - (array_0101[data[i]>>4] + array_0101[data[i]&0x0F])
	}

	alteration = count_0101
	if count_1010 < alteration {
		alteration = count_1010
	}
	return
}

One pass. One iteration per byte. Constraint satisfied. Problem solved.

Now you can see where the lookup table came from. It’s the result of a second layer of reasoning.
Humans naturally look for patterns, shortcuts, and transformations before jumping into brute-force computation.
At least, that’s what experienced programmers tend to do.

Most LLMs failed this challenge 1–2 years ago.

Now?

Things get interesting.
Compared to a year ago, modern LLMs are dramatically better.
Reasoning has improved.
Tool usage has improved.
Planning has improved.

We now have GPT 5.5, Claude Opus 4.8 (fable? not available yet due to certain someone), Kimi, DeepSeek, MinMax, Qwen, and several other frontier models competing at the highest level.
So I decided to try the same challenge again.

First Attempt

Honestly, after seeing the first response, I almost stopped the experiment.
It was exactly the same pattern I saw a year ago.

Process every bit.
Ignore the intent behind the constraint.
Brute-force the problem.
This is what I was talking about earlier.

Here’s the generated code if you’d like to inspect it yourself:

func CountMinimumAlteration(data []byte) (alteration, numIteration int) {
    n := len(data)
    if n == 0 {
        return 0, 0
    }

    minChanges := 0
    totalIterations := 0

    for i := 0; i < n*8; i++ {
        totalIterations++
        byteIdx := i / 8
        bitIdx := i % 8
        if byteIdx >= n {
            break
        }
        expectedBit := uint8((i & 1))
        actualBit := (data[byteIdx] >> (7 - bitIdx)) & 1
        if actualBit != expectedBit {
            minChanges++
        }
    }

    changes2 := 0
    for i := 0; i < n*8; i++ {
        totalIterations++
        byteIdx := i / 8
        bitIdx := i % 8
        if byteIdx >= n {
            break
        }
        expectedBit := uint8(1 ^ (i & 1))
        actualBit := (data[byteIdx] >> (7 - bitIdx)) & 1
        if actualBit != expectedBit {
            changes2++
        }
    }

    if changes2 < minChanges {
        return changes2, totalIterations
    }
    return minChanges, totalIterations
}

The solution works.
But it completely misses the point.
The challenge wasn’t “Can you count bits?”
The challenge was “Can you discover the better approach?”

Another Frontier Model

Testing with SOTA #2

The first response was disappointing.
However, after a few follow-up prompts, it eventually arrived at a satisfying solution.
That’s already a significant improvement compared to older generations.
The model could be guided toward the correct reasoning path.

Yet Another Frontier Model

Testing with SOTA #3

This one was noticeably better.

I believe you already know which provider this is.
And I know for sure that with just one additional prompt, it would reach the exact solution we were expecting.

Can you see the difference now?

How a human programmer can often identify the pattern and arrive at the right approach on the first attempt?
Would you rather let the LLM solve it with your guidance?
Or would you trust it to solve the problem autonomously?
Personally, I definitely wouldn’t choose the third option.
At least not yet.

My Take

Don’t take my word too literally.
I’ve seen many successful projects built with LLMs.
I’ve used them myself in complex scenarios and achieved very good results.
I even maintain subscriptions to several AI services.

I’ve tested advanced models and coding agents (including tools like AmpCode), and many of them are genuinely impressive.
But the best results usually come when the human remains firmly in control.

This is just my personal opinion—my rule of thumb:

Warning!

If you can’t do it manually, DO NOT automate it with LLM.

For my use cases, LLMs are fantastic for:

Learning a new programming language
Understanding an unfamiliar codebase
Quickly summarizing how a system works
Generating boilerplate
Accelerating work under time pressure
Handling repetitive tasks

In other words, they make excellent juniors.
They can move quickly.
They can save hours of effort.
They can handle a lot of the busy work.

But the key difference is that you’re still driving.
You know when the answer is wrong.
You know when an optimization is missing.
You know when a design decision doesn’t make sense.

You can guide the model because you understand the problem yourself.
That’s where I think LLMs provide the most value today:
Not replacing programmers.
But amplifying capable ones.

← Zine, not Hugo...