Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What does it take to test a sorting routine? (2010) (reprog.wordpress.com)
27 points by ColinWright on Aug 16, 2022 | hide | past | favorite | 17 comments


Imandra [1] has an excellent literate programming walkthrough that hits on the same issues and plugs the holes one by one using formal methods, ultimately proving merge sort correct.

[1] https://docs.imandra.ai/imandra-docs/notebooks/verifying-mer...


To add to the article's ideas, I would also test by brute force all sequences made from a finite set of values for small lengths. For example:

Length 0: [].

Length 1: [0].

Length 2: [0,0], [0,1], [1,0], [1,1].

Length 3: [0,0,0], [0,0,1], [0,0,2], [0,1,0], [0,1,1], [0,1,2], ... [2,2,2]. (3^3 = 27 arrays in total)

Length 4: (4^4 = 256 arrays in total)

Length 5: (5^5 = 3125 arrays in total)

Length 6: (6^6 = 46656 arrays in total)

Length 7: (7^7 = 823543 arrays in total)

Length 8: (8^8 = 16777216 arrays in total)


You definitely want to add test cases for arrays with mixed-type values and for non-arrays.

Yeah, the latter should be caught by an isArray() check... which is the whole point.


As I understand it this was a multilingual exercise. In some languages "mixed type values" isn't a thing so you can't test it and your users can't get it wrong.

Likewise for non-arrays.

It's notable that one of the participants had decided empty arrays shouldn't be a thing, so far as I know programming languages all admit the idea of the empty array although perhaps some of them don't distinguish between empty arrays and various other empty data structures.


Interestingly it apparently takes a sorting routine to test sorting routines.

It's not the first time I ended up wondering whether testing something rigorously is actually easier than implementing it correctly.


> it apparently takes a sorting routine to test sorting routines

You don't have to sort to check sortedness.

    boolean isSorted(int[] original, int[] sorted) {
        if (original.length != sorted.length)
            return false;
        for (int i = 0; i < original.length; i++) {
            if (i > 0 && sorted[i] < sorted[i - 1])
                return false;
            if (countInstances(original[i], original) != countIntasnces(original[i], sorted))
                return false;
        }
        return true;
    }
    
    
    int countInstances(int value, int[] array) {
        int count = 0;
        for (int v : array) {
            if (v == value)
                count++;
        }
        return count;
    }
I know that bubble sort, selection sort, and insertion sort can be implemented in about half the code as this. But this code is read-only and is more obviously correct than trying to prove an algorithm that mutates the array.


So far as I can tell,

isSorted([0, 0, 0], [4, 20, 0, 0, 0]) is true

(the code doesn't care about values in sorted which never occur in original and doesn't check these arrays are the same length)

But

isSorted([10, 5, 0], [0, 5, 10]) is false

(for some reason the code checks that original is sorted, rather than sorted...)

That's OK though, the original exercise exists because most programmers can't get even basic algorithms right, so you're helping to demonstrate that.


I fixed two silly mistakes thanks to your comment!

Added at the beginning of isSorted():

    if (original.length != sorted.length)
        return false;
Changed the wrong ordering check:

     if (i > 0 && original[i] < original[i - 1])
> most programmers can't get even basic algorithms right, so you're helping to demonstrate that

Now this puts my entire body of published work into question...


It occurs to me that the "you're helping" might read as snark and it wasn't really intended that way, or at least, not as snark at you, but rather all of us.

It's silly to reject help from e.g. peer review, static analysis, smarter / safer languages. If you look at some tool and say "What kind of idiot would make the mistakes this catches?" the answer is almost certainly "idiots like you".


Additionally your issorted is n^2 which may not finish if you're testing an nlogn sorting algorithm.


Does it? I'm seeing two O(n) processes: one to confirm order and the other to confirm permutation of set.

In almost-Lua pseudocode:

   function isOrdered(array, predicate)
      -- zero and one length arrays are trivially ordered
      if len(array) < 2 then return true end
      local ordered = true
      for i = 0, len(array) - 1 do
         ordered = ordered and predicate(array[i], array + 1)
      end
   
      return ordered
   end
   
   function popCount(array)
      local pops = {}
      for item in iterate(array) do
         pops = pops[item] and pops[item] + 1 or 1
      end
   
      return pops
   end
Then you compare the two popcounts, if they're the same and the result is in-order then something made them that way.


Many problems have tricky, performant solutions and simple, slow solutions. Comparing to a simple reference implementation is a very good way to test.

For some problems it is much easier to check that the result is correct than to find the result (that's what P and NP are about for example). Sorting is one of these problems. You can check that one list is a permutation of another list in linear time. Comparing against a library implementation of sorting is just simpler in many languages.


Why not just do two tests: check if every element is present exactly once or some variation of it depending on what is considered unique; do the `for i in (0..array.size-2) { assert array[i] <= array[i+1] }` as was suggested. That way you don't need to test against a known good sorting algorithm.


That reads like a perfect case for property based testing.


Why does the post have so many pictures of sushi? Am I missing something?


There’s something fishy going on.


AFAIK it's just a style/branding thing.

And he likes sushi.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: