AI is about survivorship bias

A story of an AI skeptic that explains how success in vibe coding is all about survivorship bias.

AI is about survivorship bias
Photo by Gerard Siderius / Unsplash

This is a story about AI that I don't want to tell, but it's essential to share these kinds of stories anyway. The reason I don't want to tell this story is not because it's about AI but about survivorship bias.

I'm not a fan of Vibe Coding. I think Vibe Coding is a tragedy for multiple reasons.

The first one is economical. I don't think AI will be a viable economic alternative compared to a human being. I believe human software developers will bring significantly more to the table than just "writing code" over the next decade, particularly in terms of long-term development.
And that is under the assumption that AI will continue to make the progress it has made over the last 10 years. So, betting all your money on AI is, from an economic perspective, a terrible idea.

The second reason I'm not a fan of using AI for coding is that when you do, you're letting the AI decide many details, which means a developer lacks a fundamental understanding of what the code is doing.
And as such, the developer cannot be made responsible, so you're delivering products that might contain errors, because it is famously known that software developers are better at writing code than at reading code. Even good developers are notoriously bad at reading code. So you will never want a situation where a system is generating more code than a developer can read.

These are the two key aspects that I firmly believe will prevent vibe-coding from becoming a viable alternative for a skilled software developer in the near future. Sure, it's an alternative for bad software developers who are unwilling to improve, but not for good developers (or even bad developers willing to improve).

That being said, this is a story I don't want to tell because some people will see my successful attempt at vibe coding below as proof that AI works.

For Todo2d.com, I aimed to develop a complex algorithm that allows placing a rectangle in a 2D space without overlapping existing ones, while positioning it as close as possible to a preferred location. This algorithm should do this without performing random guesses or by iterating over a grid.
The algorithm for it can be explained quite easily. In essence, you will verify if the preferred location is available. If not, check if it can be placed along the edges of the blocking rectangle in such a way that the edges, or at least the corners, are touching. If all sides are blocked, you add those blocking rectangles to the list and move on to the next one, where you start the process of placing along the edges again. Since the area is an infinite 2d space and all rectangles have a finite size, there will always be a rectangle that has an area free, so you will always have a guaranteed valid location.

When I tried to write out a code, I realized one major problem: My math knowledge is not what it used to be, and to be honest, it never was perfect to begin with. I have always had an intuitive understanding of math, which means that when I look at mathematical code, I can quickly deduce what it is doing and why it is doing it. Still, I cannot replicate it on the spot.
This gave me the advantage that when I have an adequate library, I could quickly iterate and get the result that I wanted in an optimal form. Still, if you asked me to write it out on paper, then that would be a mess because I didn't do the complicated mathematical operations in my head.
To give you, the reader, a quick example: If I needed to know the area in which a collision between two moving objects would occur, I would use the dot product since it describes the angles of two vectors and the cross product because it represents a perpendicular direction, which is needed since the objects have volume.

To summarize, I excel in math concepts, but I've struggled with actual calculations. Thank your deity that we have computers to do that for us.

At the same time, I was working on something else that also had my attention, and I cared a little bit more about that.
Given the complexity of the algorithm with its many steps, I chose to use test-driven development to ensure it worked as intended. And that requires my full attention, which, seeing my attention, it would not get.

So why not let AI do it? Let's see how far it can get.
And so I did a week of vibe coding where I wrote out the requirements and let the AI go to town on the implementation.

And to be honest, I was incredibly disappointed.
Some things I have seen were acceptable, but much of it was poor on multiple levels.
For instance, it developed a working method with all green tests. However, when it began working on a new class that would utilize these methods, it overlooked the incomplete implementation. It began criticizing the already functioning methods, ultimately deciding that the working code was flawed.
So, it modified the implementation, and then it noticed that the previous (good) green tests were now failing (since the implementation was broken), which it "fixed" by "correcting" the test. And as a cherry on top, whenever I pointed out that the test was wrong, it would insist that it was right because the test was green, even though the results were still red.

Another problem I encountered, which was mostly my fault rather than the AI's, was that the coordinate system changed frequently. This is mainly because I am used to 3D coordinate systems, where I have a good understanding of them. However, the algorithm I was writing needs to work in TDD, and for some reason, I was constantly messing up the direction of the y factor.
Up would sometimes be negative, other times it would be positive. For example, the top left corner of a rectangle would always be the (0,0) location seen from the rectangle itself. And then you would have a width and a length, which would go down and to the right. However, the coordinate system in which the rectangle was placed was Cartesian, which means that up is positive and right is positive. Thus, if you had a rectangle of width 2 and height 2 placed at (0, 0), the other corner would be at (2, -2). So that was not ideal.

At this point, I was mostly using Claude Sonnet using Copilot, which, besides the issues as mentioned above, created decent results. It did well understanding, but it also got confused, replaced valid implementations with broken ones, and attempted to gaslight me in accepting invalid answers.
I'm saying "did well" because I was working with the expectation that this would happen; however, there is one more thing that happened, which caused me to get truly annoyed.

The algorithm was written with the clear and specific intent of "sliding along the edges" and to avoid "going around in a spiral until you find a non-overlapping free space". This was done because the rectangles could have arbitrary sizes, which would make it hard to determine the location.

At one point, the AI (Claude Sonet 4) got to the point where it was working, and since I was too busy to check the result I just wrote a simple test script that would continue to attempt to place rectangles at the same location, which, given the initial grid was empty, result in non of the rectangles to overlap.
Yet to my surprise, after placing 117 randomly sized rectangles, it suddenly started putting all of them in the middle.

Although I almost instantly spotted what had gone wrong, it took me a bit over an hour to figure out why it had happened.

The short version was that the AI couldn't get it fixed, then decided on creating a "temporary" solution that used a spiral search, which, to prevent being run indefinitely, would stop after attempting 1000 different grid positions. It then decided that this algorithm, although explicitly forbidden, was better since it was working. When I asked for a "motivation," it would reply that the requirement had performance concerns and that my other breakdown of how the algorithm should work in the requirements would never work.

I could have been okay with it, since working code is better than non-existent code. Still, only if the solutions meet the requirements, and these kinds of solutions are entirely against the requirements.

At this point, I had AI already work on it for four days (as I was working on something else), and at this point, I was already planning on doing the work myself. I had considered the AI's actions as "exploratory coding", an attempt to write code with the intent to explore the problem. I had already updated the requirements I would give myself to avoid some of the issues I had encountered.

I turned the computer off and went to bed since I would be working on some of the more complex code I had written in a while the next morning.

But when I woke up, I read a news item that said that a new AI model, GPT-5, was released for coding.
Obviously, they only release these kinds of models when there is some improvement, either in speed or in reasoning or whatever they call it.
And I was like, "You know I already invested four days into it, and I already had the attention of restarting, why not let the AI write some of the basic code I needed and then continue on from that point?"

So I improved my requirements a bit more. Switching to polygons instead of rectangles (to avoid confusion about coordinate systems and assumptions about rectangles), cleaned up a lot of the language so that the requirements were more precise, and finally created a new repository.

Within an hour, it had already completed the basics. It asked if I could continue creating the method I needed. And after a short hesitation, I decided, "Why not? What is the worst that can happen?"

I was primarily interested in how well GPT-5 would work. Still, this attempt had a few more advantages (such as the requirements being improved and me switching from JavaScript to C#). So don't consider this anecdote a fair comparison.

But after 7 hours of work, it was still not working as I wanted. I was getting angry because I had fallen into what I like to call the "one more turn" trap for AI. Those familiar with Civilization games should already know what that implies, but if you are unfamiliar, let me compare my behavior to that of a gambling addict who is near a slot machine. It's the unfounded conviction that the next time the result will be what I want.

As I was typing out the last instruction, my mom called, so I just hit the return key, let the AI do its thing, and talked to my mom. After all, I wasn't working, and if the AI by some miracle did create a working result, then I could use it.

The stars in some distant universe must have aligned, because after a few minutes, I hit the jackpot. The AI had run the test and had a working result. I told my mom what was going on, and I was almost sure it was lying again. I also advised her not to trust computers, which, as a developer, I always consider sound advice.

After all, I was not convinced. The AI had reported a working solution many times before, but these reports were often incorrect. So there was a good chance that these were wrong as well. I hung up and took a look at the code.

To my surprise, the algorithm was matching what I would expect. Some things required a closer inspection, but what I understood at a glance was what I expected.

So, I added a few more tests (which, at this point, was something I had done since the AI had written the worst tests imaginable), and all of them came back green. I did a closer inspection and became convinced that the code was doing what I was expecting.

So I finally ran an extended worst-case test, a sparse grid where it would need to add a rectangle. And even that worked. Except that test run was slow. It took almost 5000 milliseconds, which was not acceptable.

"You know what? Go ahead, improve the performance," and a few minutes later, the same test took only 2000 milliseconds.

At this point, I was walking around with a large grin on my face. Somehow, the AI had implemented the algorithm as I wanted and according to the requirements that I had set out. I also realized that at this point, it had not gaslighted me or told me the requirements were an issue. The entire day, it had done what I asked, often failing, but also improving. When my mom called, I still had a lot of work to do. Still, somehow, the last command, "finish the implementation as mentioned in the specification," had resulted in an acceptable outcome.

I decided to go out for dinner, but I used a remote desktop solution to connect to my computer at home and instruct it to start bench-marking the code and continue improving the performance. It did this without breaking the solution, except on one occasion, which it promptly fixed.

There were some issues with memory allocation, so I added another, more extreme test. After a few hours, it reduced the memory requirement to an acceptable range. Manual optimization would have created a better result, but good is better than perfect. The performance of the more extreme case was lower than that of the previous extreme case, which was also great news.

So after one additional day, I decided to put an end to it. The algorithm is working; further improvements are nice but not needed. Frankly, at that point, we are working on a scale that is better off using alternative methods of finding locations because then we would need to position the state, and we need to have caching at that scale.

But the code is working, so I'm satisfied with it... Even though I'm not happy with it.
I still have significant concerns about the code and AI in general. I still don't consider it proven, as the only reason it achieved a good result was due to heavy guidance from someone with a good understanding of what needs to be done.

On the other side, this AI had done something in a single day that would take me a week (mostly because I would need to brush up on my mathematical knowledge).

So what's next? First of all, this needs to be part of the product I'm working on, and I will need to conduct more practical testing to ensure the algorithm works as expected.

I still don't think having an entire code base written with a live language model is a good idea. Even if it is well tested and everything. I have some significant concerns with LLM (ethical, economical, climate-related, etc); however, I'm not going to ignore the benefits of AI. Sometimes you get the result you want.

And that is not a story I want to tell, since this is not a story about AI. This is story about survivorship bias. I just got lucky after pulling the handle on the AI slot machine for almost 7 days straight.

Mastodon