tHog

The Python whitespace thing

2015-01-21

I'm going to end this controversy once and for all. Is the syntactic whitespace of Python a good thing or a bad thing? It's neither; the best syntax is actually found in some relatively obscure languages.

First of all, I've used Python since about 1999/2000. It is the first language that really got me hooked into programming. It is the modern equivalent of Basic in that I would recommend it as anyone's first programming language. Yet it is a real, powerful language I still choose by default, and the professionals also use it extensively in places like Google. It is the closest thing to pseudocode I have ever seen — it is the only language where I can write down a substantial idea and it runs perfectly straight away.

The whitespace controversy

Pros

  1. Programmers make extensive use of whitespace anyway to make the code more readable. Using both syntactical punctuation for the compiler, and spaces for humans, is redundancy, and redundancy is bad, mmmkay?
  2. Leaving out superfluous punctuation makes the code much cleaner and easier to read (see below). Python is designed to be easy for humans to read and write. The code is more compact as you don't waste space or attention on punctuation, and it is easier to handle a larger whole.

Cons

  1. Artistic flexibility is lost, as Python forces stylistic choices like one statement per line. In the Python world, this is regarded as a Good Thing™ — there should be one clear way to do one thing, as opposed to the Discordian philosophy of Perl. It is a matter of style and habituation, but you might also ask: do you want to build production code like you build bridges, or like you write poetry?
  2. As block indentation can be done by tabs or any number of spaces, conflicts may arise as different people edit the same code with different settings on different editors. This is a genuine technical problem; the syntax is too fragile for diverse social collaboration. Could you write the equivalent of the Linux kernel in Python?

Syntax aside, I think Python's real power lies in its Batteries Included philosophy — a metric bazillion of libraries for different tasks is available by default. When you're advanced enough a programmer to not care about superficial syntactic issues too much, you'll probably appreciate this. At the same time, Python is probably not the perfect language for your specific task, but its universality goes a long way.

[2022-04-09] One more pro from a Slashdot comment: "Every single line starts with an explicit mark indicating the block it belongs to. With the braces you have to parse backwards and count up all the braces."

The best syntax so far, feat. a little whitespace

For a long time, Fortran 90 was the language with the most plain and simple syntax I knew. It was also the best/fastest language I knew for doing math, which is what computers generally do; the scientifically minded syntax allowed smart compilers to parallelize even my most embarrassing code, years before the hoi polloi were sold SMP under the "dual core" slogan. It was my other favourite language besides Python.

Unfortunately, Fortran is not a very nice all-round language. However, Julia has managed to combine the best of Fortran and Python into a rather decent whole. The syntax is apparently inherited from Fortran, but there is a sense of Python underneath. To see for yourself, here is the main loop from my DNaLS code in Python...

for x in range(xmin, xmax):
    if x % report_int == 0:
        workstate = text_encode([x, xmax, nmin, nmax, checksum])
        print(workstate)
        
    qrks = qrklist(x)

    if len(qrks) > 0:
        for n in nlist:
            # Solutions are now handled within root_find_diff
            cs = root_find_diff(x, n, qrks)
            checksum = (checksum + cs) % checksum_mod
  
...and in Julia:
for x = xmin:(xmax - 1)
    if x % report_int == 0
        # Fix the potentially negative modular checksum
        checksum += checksum_mod
        workstate = text_encode([x, xmax, nmin, nmax, checksum])
        print(workstate, "\n")
    end

    qrks = qrklist(x)

    if size(qrks)[1] > 0
        for n in nlist
            # Solutions are now handled within root_find_diff
            cs = root_find_diff(x, n, qrks)
            checksum = (checksum + cs) % checksum_mod
        end
    end
end
  
In many cases, porting Python to Julia simply involves removing the colon and adding the end statement. This marks the crucial difference in whitespace handling: Fortran/Julia only cares about line breaks. The "end" is superfluous with the human-readable indentation, but personally, I find it helps to have a clear marker; closing the mental parentheses. Besides, Python has an ugly punctuation character where Fortran/Julia has nothing.

Now, Julia isn't the perfect all-round language either. For example, the quirks of different numerical types has driven me nuts from the beginning, but (a) it is probably expected from a numerically oriented language and (b) it is still being heavily developed. The JIT compilation issues are likewise of category (b).

On punctuation and ugliness

I consider punctuation ugly when it makes things visually messy without providing any essential function. Of course, languages with C-style punctuation do have their reasons, which is basically the ability to put multiple statements on a single line. It is interesting to think why we would want that, given that an overwhelming majority writes with neat line breaks and indentation.

Incidentally, we have punctuation in natural languages for pretty much the same reason. Without punctuation,

we could simply write one clause per line
and then we would not even need capitals to begin sentences
but the downside would be that
it would take a lot more paper to print books
and it might also break the flow of reading

Source code is quite different from books. It actually benefits from having plenty of space and line breaks in the right places, because we don't read it like we read a book. In my experience, things often get ugly and inefficient if we insist on real-world or dead-tree models for our computing.


Risto A. Paju