On Twitter I had a thread going this year in which I tried to reflect on bugs that I found throughout the year, how to avoid this kind of bug, what can be learned, etc. I will port this idea over to here and see how it goes in the future (I'm still both here and on Twitter, we'll see how that goes).
Recently I fixed a bug in PyPy's time.strftime. It was using some unicode helper function that takes as argument a byte buffer with some utf-8 encoded string, as well as the number of code points. strftime was using this API wrong and passing the number of bytes instead.
After finding the bug we tried to make this API more robust by having a check in the function that counts the codepoints in the byte buffer and complains if that is different from the second argument. This shouldn't be one by default for performance reasons, but it's on during testing.
The reason why the bug got away for so long is that if you test only with ASCII chars it works, because number of bytes == number of codepoints in that case. Lesson: write tests with wider ranges of characters.
Another bug, this time in itertools.tee: tee has an optimization that uses a __copy__ method on the iterator if it has one, instead of carefully using its generic implementation. However, PyPy got it wrong and copied the iterable instead of the iterator
https://foss.heptapod.net/pypy/pypy/-/issues/3852
This works in simple tests, but in more complicated situations it gives nonsense.
, also present in CPython. On Linux, if you pass MSG_TRUNC as a flag to socket.recv (which calls recv in its implementation) it will return the size of the packet, not the number of bytes written into the output buf.
https://foss.heptapod.net/pypy/pypy/-/issues/3864
This confused the logic in socket.recv, it leads to an assertion error in PyPy (trying to read too many chars from the output buffer) and getting garbled characters in CPy. Fixed by not reading more than the buffer size from the buffer in PyPy in that case
CPython bug: https://github.com/python/cpython/issues/69121
someone could fix this! probably not super hard.
I learned again that I know nothing about network programming :-(
Fixed a bug in PyPy's 3.9 parser (based on the new PEG parsing approach introduced in cpy 3.9). The parser would report a valid generator expression in a function call as lacking parentheses, but only if there is another syntax error further down in the file. Eg
f(x for x in y)
if a:
pass
Would report line 1 (which is fine) not line 3.
Bug was an oversight, leaving out an 'if' in the logic when porting from CPy. Shows that error cases are often not tested enough?