This article is part of a series on computing:
- Classic Computing and Encoding (to come soon)
- On the Choice of Programming Languages
- Human Computer Interfaces (to come later)
What about Programming Languages?
For the layperson computer programming is full of mystique. Perhaps the only vague notions they have are about viruses and hacking around security, but programmers are usually seen as a special breed, probably not quite human.
Programmers construct "apps", a word most of us have heard.
"App" is the abbreviation of application, which itself is a relatively recent word for program.
A program is a sequence of instructions for a computer, and it is expressed by writing down those instructions using an artificial "language" called a programming language.
Note that the phrase "sequence of instructions" is somewhat misleading: some of the instructions may be decisions or may indicate repetition, meaning a program is not just the list of steps the computer is supposed to do, but it tells the machine to make choices as well. Such decisions are based on the situation at the time the program is used, not at the time it is written.
Human versus computer languages
We know what human language is: words (a dictionary) and rules to form sentences (grammar). Human languages are a little fuzzy, and they have sentences that are simple descriptive statements (It rains), others that are questions (Does it rain?) and yet others that are orders (Shut the door!).
Computer programming languages also have a dictionary and grammar, but they usually have only orders as sentences. They are normally very precise: each order will do exactly what it is meant to do and do it exactly the same way each time.
Programming language sentences are called statements, though they are in fact orders.
As I stated earlier, programs can make decisions based on what is happening outside them, therefore computers can do more than only pre-determined things. The discussion of the intellectual power of computers is a different topic however.
A difficult concept
On first contact with computer programming, perhaps the most difficult to grasp idea is that of memory. Humans of course use memory all the time, but we no longer notice. Nearly every action we do needs memory: even when you adjust your watch you will need to read the time off another clock, remember it, and set it into your watch. There is a short moment during which you will need memory.
In programs memory has to be used explicitly: values, whether numbers, text or images, must be stored in the computer's memory before they can be manipulated. Any intermediate value that will be needed later in the process must also be stored explicitly. For such storage the computer uses containers, called "variables" in the jargon.
Here is a simple example that is highly illustrative of the different way humans think about such storage, or rather, ignore it: swapping two objects in space. We will start with the computer version.
Suppose some container A has the value 4 and another container B holds the value 10. You now want to swap the values, to make A hold 10 and B hold 4. The sequence of instructions absolutely needs a third, intermediate container, that is needed only temporarily. Let's call it T.
Then the value of B, which is 10, can be assigned to (copied into) T, followed by copying the value of A, being 4, into B (thereby obliterating the 10 that was there), and finally copying the value from T, 10, into container A.
We now have achieved the swap, and also have a copy of one of the values in container T, but that is irrelevant.
Does this happen in real life too? Absolutely! Just try to make two of your dinner guests swap chairs.
One of them has to get up and take some temporary space, away from the table, then the other guest moves from their chair to the now vacant one, and finally the first guest can sit in the other chair.
If there is not enough room, i.e. if the temporary third space does not exist, then you have a real problem (try to swap seats in a car). But we never think of "storing the first guest in a temporary memory container", because empty space is usually in plenty supply.
Containers and Assignments
One programming language statement is extremely common: the assignment.
Its purpose is to put a value in a container (assign a value to a variable).
As we saw in the previous section, operations on containers and values are at the core of what programs do.
Crucial here is that containers can be created by the programmer and are given names (identifiers) of the programmer's invention.
Let's see how the swap discussed earlier would be written a few different programming laguages. We need two assignments to set up the values of A and B, and three more to do the swap. We first use the programming language LiveCode:
put 4 into A
put 10 into B
put B into T
put A into B
put T into A
And this is how it would be written in Algol:
A := 4;
B := 10;
T := B;
B := A;
A := T
As a final example this is what it looks like in the language php:
$A = 4;
$B = 10;
$T = $B;
$B = $A;
$A = $T;
All three examples would make the computer do the same things, but it is shockingly obvious that the way the orders are expressed is different; the languages shown here are as different from each other as English, Russian and Chinese.
The rest of this essay deals with these differences and what makes it important to choose the right language for the job at hand.
One important note though: look at the statement put 4 into A and notice that the words put and into are part of the LiveCode language dictionary, 4 is a value, which is not in the dictionary, and A is an identifier that was invented by the programmer as the name of a container, and it is also not in the dictionary (just like your own name is not in any language dictionary).
The grammar part of the statement is that you must write
put <value> into <container>
where I used angle brackets around those things that may vary but must be in those places and of that nature.
That is not dissimilar from rules of natural languages, where the grammar for
The child eats a banana
is something like
The <subject> eats a <object>
A little history
When electronic computers were first built at the end of World War II, they were not only large and slow compared to today's machines, but they had very little memory and their programmers, obviously, had very little experience.
The problems solved usually had to do with physics and mathematics. Programs were put into the machines by hand, setting the bits of the instructions into memory one by one. This sounds like it was extremely tedious, but there was no interaction between the software and the computer's users: programs (or "apps" if you want) mainly did calculations and produced lists of numbers. The purpose of an app was also extremely well known, the programs were short. The computer was just a much, much faster calculator, driven in an automatic way.
Programming languages came into being only a decade later, in the 1950s. By that time computers had acquired more memory, and there were more users who wanted to do more complex calculations. These users needed a better way to put their "apps" into the machine, they needed to express their programs in a notation, a "language", that was closer to what they used in their field of expertise.
Indeed, computer users of this new generation were experts in other subjects than computers and they did not want to waste their time with details irrelevant to the application (this is still one of the major goals of LiveCode: provide access to computing without having to learn details unrelated to the problem).
The brilliant insight behind programming languages is that the computer can be programmed to translate by itself the language statements into its own instructions, and the process of programming then has two steps: writing the program in some language followed by translating that program into the computer's hardware instruction set.
The first successful computer programming language was FORTRAN which is short for "FORmula TRANslation system".
The basic idea of FORTRAN was to let physicists and mathematicians write formulae more or less the way they were used to. For example, if they had an equation
they could type something like this:
which was not exactly the same, but looked close enough (the differences arose mainly from the need to use simple electric typewriters and to specify the multiplication operator explicity). Note that the mathematical form is an equation, but the computer form is an assignment statement: it gives the operations to perform with the value of X and then tells you to store the value so obtained into a variable called Y.
In the first step the computer would translate this statement into its own instructions. Depending on the particular machine, the above example might have given a sequence somewhat like this:
LOAD X
LOAD X
MULTIPLY
LOAD 21.5
LOAD X
MULTIPLY
ADD
LOAD 354
SUBTRACT
STORE Y
The translation process from the typed formula to the sequence of machine instructions is done mechanically by a program called a compiler.
Compilers can do more than produce machine code, they can detect some simple errors: for example 354 would pass as being a number, but 3S4 should be rejected and the compiler can figure that out and tell the programmer of the error.
However, mistakingly typing a + instead of a - would not be detected.
3S4 is not a valid number syntax and that error is therefore called a syntax error. But using the + instead of the - cannot be detected easily by any compiler: it would have to know the programmer's intention or even understand the problem to be solved by the program. Such errors are called semantic errors, and they are the real stuff for debugging, the correction of program errors.
Compiling code in the 1950s and 60s took time and bug correction was tedious. Today compiling is so fast one does not notice that it happens. Debugging can still be difficult and slow.
But let us note an important decision that was taken by the designers of the FORTRAN programming language:
to let physicists and mathematicians write formulae more or less the way they were used to
That well-meant decision introduced the first quite bad aspect of programming languages that remains with us today: the syntax of the assigment statement.
The formula above defining Y as a function of X looks quite OK. Programs are mainly used to do repetitive work, and that means quite often having to count how many times a certain operation has been performed so as to know when to stop. For that programmers had to write statements like this one to manipulate a counter:
Obviously that is a nonsensical mathematical formula. No value can be equal to itself plus one. The = sign in FORTRAN does not mean equality at all! It does not mean that the left hand side is equal to the right hand side of the formula, as it does in mathematics. It is not symmetric, it has a definite direction: I=I+1 effectively means "take the value of memory location I, add 1 to it, and deposit the resulting value into the memory location I again. It would have been much better to write
which would also have followed the sequence in which the computer does its operations. And there would have been no confusion with equality.
LiveCode does in fact write that statement in the better way:
Or even more simply and concisely:
The main reason for the bad FORTRAN syntax was to try not to frighten the mathematicians with something unfamiliar; another was that the → symbol did not occur on typewriters and card punch machines.
The first reason might seem a psychologically good one, but all early books on programming invariably started out by explaining that I=I+1 does not mean what the mathematician expects, and they went to great lengths pointing out the difference. It is not clear if the mathematicians of the time would have been confused more by a simple explanation for the I + 1 → I syntax, but I'm willing to bet that they would have accepted the arrow notation easily and gotten on with the job.
Unfortunately, most programming languages have kept the FORTRAN syntax for assigning new values to variables. Some have attempted to make the difference clearer: Algol used
which at least shows that the assignment operation is not symmetric.
A bad consequence of choosing = for the assignment is that you now have to think of a different symbol for testing equality. FORTRAN used a quite ugly solution: if you wanted to make a decision depending on the equality of two values A and B, you would write something like:
That would test the values of A and B for equality and skip to statement labeled 56 if true. To separate the test from the choice of decision that followed, parentheses were used: another early choice that is still with us in many programming languages. In LiveCode the above choice would be written like:
C-like languages (javascript, php, perl, java, …) use == to test for equality:
However, as these languages also allow assignment as an expression returning a value, they will allow you to write
which is not an error, but means something entirely different: take the value of B, put it into A, which changes the value of A; the result of that assignment operation is the value of A, and that is then tested for being equal to zero or not. That is completely different from testing whether A is equal to B!
Books teaching programming in C-like languages always point out that typing = instead of == is one of the most common mistakes made.
Note also that these design decisions were made in the 1950s and are still in use, whereas no-one today would consider buying a computer designed in those days.
On Referencing
You may have noticed that when we write
we are in fact using X with two different meanings. The first X should be read as "the value stored in the container X" whereas the second X means "the container X". Specialists of natural and computer languages know this phenomenon as the problem of referencing.
In most programming languages referencing is not an issue, as the second type of reference only occurs in assignment statements.
Nevertheless, some programming languages have very confusing ways of dealing with referencing. In ADA, a language designed for robust programming, an attempt was made (again well-meant) to ease the programmer's work by changing some kinds of references into other kinds automatically (especially concerning pointers, but we don't need to go into details here). The motivation was much like that of FORTRAN's attempt not to frighten the mathematicians. And like in FORTRAN's case the resulting evil was probably worse than the perceived problem it tried to solve.
AppleScript has a big problem with referencing. Say we have a number of operations to do on an object X, and let's generally write this as operations-on-X.
There are many cases in which the statement
gives different results from the sequence
set Y to X
operations-on-Y
and some of the ways may raise errors while others would not.
A typical example of a surprise that you would not expect is this snippet, which works:
set x to current date
get year of x
But this snippet gives an error:
And to my great surprise this ugly snippet does work:
get year of (current date)
AppleScript syntax has roots similar to those of LiveCode, but the problem of confusing references makes it quite difficult to write working AppleScripts; it leads to a large number of questions posted on forums, with even more solutions that limp along, not truly working and not truly giving an answer either.
On Weak Typing
Today's popular programming languages let you happily mix types of data: if you append a number to a string of characters, the number will be converted to a string of characters automatically and the result produced without signalling an error. That may be what you want most of the time, but often it leads to subtle bugs or undesired effects.
LiveCode essentially has only one type of data: the character string, and thus strong typing does not give much of an advantage.
Languages with multiple types of data should do some type checking to avoid unnecessary transformations and to help the programmer find unintended ones.
I will not here go into the deeper aspects of strong typing, suffice it to say that in my experience I have been much helped by compilers that insisted on explicit conversions by refusing to accept my woolly thinking and forcing me to be precise, rather than trying to guess at conversions and producing wrong output which then leads to long debugging sessions.
Here is an example of which I do not really know if it is a problem of referencing, weak typing or simply bad tests for conversions. It is a piece of AppleScript that reloads the page currently displayed in the Safari browser:
tell application "Safari"
set ThePage to URL of front document
open location ThePage
end tell
It works, but is not what we would expect to write, i.e.:
tell application "Safari"
open location the URL of front document
end tell
While the above does not produce an error, it does not do what you think it should do; when you run it, it apparently does nothing. However the following piece does the same thing as the first one:
tell application "Safari"
open location (the URL of front document as text)
end tell
No wonder there are thousands of sites giving advice on writing even very small snippets like the above ones: it is all very confusing and impossible to remember unless you spend your day programming and testing AppleScript.
A Common Worry that should not be one
We discussed above a number of programming language problems of syntax that have a very early origin. Most of them were attempts to relieve a difficulty that the language designers perceived might be encountered by the programmers using the language.
Very often a secondary motivation was to avoid extra typing by the programmers. This was especially the case for C-like languages, where it was in fact the main motivation, but the referencing problems of ADA and the assignment statement of FORTRAN also suffer from this worry of having to type extra characters.
Any programmer knows that there is a phase of writing program code, followed by a usually much longer phase of testing and debugging.
During the debugging phase the programmer looks at the code a lot, changing things here and there, but most of the time is spent reading the code that is already written.
This second phase may be shortened and made less painful in a number of ways. Writing good documentation of what the program statements are intended to achieve, using meaningful identifiers, and placing comments at difficult statements are the three most productive remedies.
All three of those remedies however mean typing more characters, not fewer! One of LiveCode's goals is to make the language easy to read, avoiding the need of lots of comments.
This was observed very early on: computing theoreticians and practicians like Dijkstra, Wirth and others pointed out decades ago that much more time is spent reading code than writing it. They used this fact to argue that a good language design could reduce the debugging time profitably even if it meant a lot more code had to be written first. Their proposal was to favour readability of the code bys typing more characters, rather than typing less and being left with difficult to read and understand programs.
Letting the Language do the Work
Several languages that followed FORTRAN tried to put "readability over writability" into practice.
Some, like COBOL, became very verbose. Others like PASCAL were clear yet concise but had other problems.
No programming language is perfect, but we should perhaps here spend some time on the various attempts that were made to reduce debugging time.
The two major features that help the programmer are required declarations and strict typing. Both also have drawbacks though.
Declarations
Required declarations mean that the programmer must list all the variables to be used at the start of the code. Using a name without declaring it at the start leads to a syntax error and the compiler will simply not let it pass. Usually it's not just a list of identifiers: each declaration also states the type of the variable.
The drawback of required declarations is that over the development of a program the use of variables changes, and some may be left unused while others may change their type.
LiveCode gives the programmer a choice between requiring declarations or not.
Strict Typing
One advantage of variable declarations is that there is a well-defined place to look at the definitions, but another one is that the compiler can use the definitions to check that variables are used in the intended way. For example, if I declare X to be a whole number, but then later on I write a line of code that puts a character value into it, the compiler can tell me that I probably did not intend to do that.
The drawback of strict typing is that some operations become very verbose. Say I want to have a whole number i to count with, and I want to produce the ith multiples of some number, say 3.45. In a strictly typed language I cannot simply write
because i is an integer and 3.45 is not, it is a real number. What should the result be, integer or real? I then need to specify some conversion, most probably use a function to convert the integer to a real number before the multiplication:
X := ConvertToReal(i)*3.45
This is ridiculously complicated. For this reason most strictly typed languages allow automatic "widening" of types. I.e. since all integers are also real numbers, it is OK to let the compiler do the "upward" conversion instead of having to write the call to a function. The inverse is usually not allowed, but once again we do not need to go into detail here.
A General Principle — the First Criterion
In the 1970s, when I designed P+, a language for process control, I tried to let the compiler do as much checking as it possibly could before delivering a program to be run (and then debugged, perhaps after it had caused some damage to the machinery it controlled).
The motivation was that anything about the program that the compiler cound understand it could also check, and point out potential errors to the programmer.
Ideally, the program specification itself should be used by the compiler. If we lived in this ideal world with good artificial intelligence, then we would not have to write programs at all. Just telling the compiler a specification would be sufficient. I could write:
Print a list of the first 10'000 prime numbers
and the compiler would be able to write the program for that.
But we don't live in that world. Instead, I need to add quite a lot of stuff if I want myself to understand the program. Anything that I need to produce but that the compiler cannot read and understand, is called documentation. That includes diagrams, comments, specifications, proofs, assertions and so on. In the ideal world there should be none of it: the program code itself should be its own documentation. This is the goal of declarations, strict typing, long identifiers, and other syntax constructs, but they replace only a part of the documentation.
While we do not live in the world of self-documenting program code, it is still possible to design the programming language so that it is eminently readable to humans. That indeed will help a lot: if the language is so clean and clear that I do not have to write much documentation, then I can concentrate on the program's functions and be productive.
The most important criterion for a good programming language perhaps is:
The syntax should minimise the need for documentation
The Second Criterion
We will have to read our code a lot, poring over it to find bugs. The smallest distraction will waste time and effort. We will not achieve speed-reading, like we may do for reading the prose of a novel, but we may perhaps design the language so that it resembles prose more than it resembles gobbledygook. I personally find the use of special signs and punctuation hindering the flow of my reading. Punctuation of course is actually meant to interrupt: it signals the end of a sentence, a pause inside a sentence and so on. But too much use of quotes, commas, parentheses (and especially nested parentheses) is disruptive. If we can do without, so much the better for our reading.
Nested parentheses are such a nuisance that most modern program editors will highlight matching pairs automatically and as you type code. That must be an indication that our eyes and brains are not suited to find matching parentheses. I would therefore conjecture that it is easier to read
if b is not zero and a/b<c then …
than it is to read
if ( ($b!=0)&&($a/$b<$c)) { …
Note that the second line uses ((())!&&$$$$ as characters from the top row of the keyboard, whereas the first line uses none.
Those top-row characters are also the ones used in cartoons to indicate swearing or unprintable words.
My second criterion for a good programming language is then:
The fewer non-alphanumeric characters are needed, the better.
The Third Criterion
When we read code we need to know what it means. If I cannot remember the syntax of a certain kind of loop construct then I must look it up in the language reference manual.
My first programming language was PL/I. It was a laudable (once more well-meant) attempt by IBM to produce a better language, and one that would combine the virtues of FORTRAN and COBOL. It was the end of the 1960s and computers were just about getting big and fast enough to begin to explore human-machine interaction. PL/I was aimed at both the professional and the casual programmer. But in its zeal to cover everything, its syntax had so many possible meanings depending on context, that it was impossible to write even the smallest program without having the language manual open beside you. Constant lookup of features was necessary to avoid unintended effects.
The design of PASCAL was a strong reaction to that phenomenon. Niklaus Wirth's first motivation for his design of PASCAL was: to be able to keep the entire syntax and semantics of the language in one's head, i.e. without need for reference to a manual. It was a huge success, and it died only because unlike FORTRAN it did not evolve fast enough. While it lasted, PASCAL was certainly a pleasure to work with. This experience leads me to my third criterion for a good language:
A good language should minimise the need for looking in the language reference manual
What will happen?
Let us come back to the decision statement
if b is not zero and a/b<c then …
Suppose b is zero. Then if we try to divide a by b, we will divide by zero, and that will crash the program.
But if b is zero, then it does not matter what the value of a/b is, because the whole expression will be false anyway. This is always so because
is false if either one of them is known to be false. It's not necessary to know them both. It's only if p is true that we also need to look at q.
In other words, the and operator should simply skip computing its second part if the first one already is false.
LiveCode and some other languages do explicitly define the and operator to skip in the way discussed, and hence there will be no problem dividing by zero if b is zero, because it will not be attempted in the first place.
A similar effect is true for the or operator: p or q is true as soon as p is true and we do not have to look at q.
The operation of the switch statement is more difficult, but it similarly suffers if what happens in certain cases is not well-defined.
Many language implementations do not define precisely what operators, statements and functions do: you have to find out what the compiler does by trying it out. This type of "experimental programming" is a huge waste of time, and it also makes it difficult to port programs from one system to another even if they use the same language.
Similarly, the results of all functions should be well-defined for all values of the arguments.
My fourth criterion says this:
The result of all syntactic constructs should be well-defined
Happily LiveCode satisfies this criterion extremely well, but many C-like languages, especially the interpreted ones, do not.
Last but not Least
We observe that as programmers we read code a lot. But which code? If today I make a web site, I will need knowledge of html for the content, of css for the styling, probably of php and javascript for actions on the server and client side, and perhaps even sql if a database is involved. Worse, there may be processing of data to be done off-line, for which any language might be needed to write an application. And I am not even talking about scripting the operating system or some scriptable applications.
That means not only a lot of reading of code, but potentially switching between six or more different programming languages!
The problem is compounded by the fact that some syntactic constructs are required in one language and forbidden in another. The use of the semicolon or the $-sign comes to mind.
Then there are the errors that are more subtle, where syntax checking does not help. I once lost an entire afternoon chasing what looked like a very strange bug: I got results that were sometimes the ones I expected, and sometimes completely wrong. After inspecting the code (remember: reading and reading and reading code) I thought the problem resided in the data. But after lengthy examinations that appeared not to be so. Finally I spotted my error: instead of "." I had typed "+" for the concatenation of two character strings. Php uses "." but javascript uses "+" and the operation I wanted sat in a piece of code that was written in php. The php interpreter simply assumed I wanted to add, and converted the strings to numbers, then presented the addition. By coincidence the strings to be concatenated were indeed often composed of digits. I would not have lost that afternoon if both languages had used the same operator. My error would also have been detected with strong typing, which would have refused the addition of strings at syntax check.
Given that I need many different pieces of cooperating code, I will also have to write a lot of documentation.
Why can I not use the same programming language throughout? After all, I will most likely write the entire documentation across all cooperating code in one natural language: English.
What if I could use an English-like language to do almost all of the programming too?
It is not impossible: to build a website, LiveCode cannot be used for html and css, but it can be used for all the rest. That is an enormous advantage: as a programmer I can now concentrate entirely on the algorithms to get the job done, and there are no bugs that remain obscure to me due to my imperfect knowledge of several different languages, or simply a coincidental confusion. There is no longer the need for experimental programming when it is impossible to find a complete definition of what a construct or function does. I no longer need to invoke tons of weird text processing functions: I can use the chunk syntax of LiveCode, and everyone knows that a lot of text processing goes on in today's code for the web.
That leads to my fifth and final criterion for a good programming language:
The more areas of application a language covers, the better it is.
LiveCode Examples
You may look at some LiveCode examples, I still dabble in programming.