Joseph Wilk

Joseph Wilk

Programming bits and bobs

The Aesthetics of Density

Programming languages can be described as Dense.

What does dense mean?

Closely compacted in substance. Having the constituent parts crowded closely together:

What does it mean for a programming language to be dense?

I consider there are two axis for the density of programming languages:

Density of syntax The syntax is very dense/compact.

For example Assembly has a very dense syntax, abbreviate commands, small register names, etc… It takes a lot to express simple expressions

A simple “for” loop in Assembly

1
2
3
4
5
6
7
8
9
10
11
12
13
mov cx, 3
startloop:
   cmp cx, 0
   jz endofloop
   push cx
loopy:
   Call ClrScr
   pop cx
   dec cx
   jmp startloop
endofloop:
   ; Loop ended
   ; Do what ever you have to do here

Density of expression The means of expressing simple concepts or solutions is very compact.

This is a little fuzzier than syntax, it can depend on what you are trying to express and languages often provide many different ways to express something. For example string processing in Erlang is a lot messier than say, Ruby. Paul Graham measures this form of density by the number of elements

As an example PROLOG scores highly in expressive density. One of the main reasons why is when you give up control of execution (imperative style) and describe the problem (declarative style) you increase the expressive density.

The towers of hanoi in Prolog:

1
2
3
4
5
6
7
8
9
10
11
12
move(1,X,Y,_) :-
    write('Move top disk from '),
    write(X),
    write(' to '),
    write(Y),
    nl.
move(N,X,Y,Z) :-
    N>1,
    M is N-1,
    move(M,X,Z,Y),
    move(1,X,Y,_),
    move(M,Z,Y,X).

Programming languages fit along a spectrum within these forms of density. Ruby provides the means to express concepts very syntactically densely. Just look at Ruby Golf (solving a problem with the smallest possible number of characters) for example:

1
2
3
def fizzbuzz(n)
n%3<1&&f="Fizz";n%5<1?"#{f}Buzz":f||n.to_s
end

It is also always posible to build a DSL within a programming language to maximise density.

Where does Density fit with Literate Programming?

Dense syntax moves code away from being an easily accessible form of documentation.

Density of expression can move code away from being easily accessible as documentation. For example do you understand how that PROLOG towers of hanoi works?

The more focused a language is on expressive/syntactical density the further it moves the art of programming away from Literate programming where we focus on our code being the documentation. Much like writing an essay:

Instead of writing code containing documentation, the literate programmer writes documentation containing code.

Ross Williams. FunnelWeb Tutorial Manual, pg 4.

The readability of the code to humans is the priority.

Under the literate programming paradigm, the central activity of programming becomes that of conveying meaning to other intelligent beings rather than merely convincing the computer to behave in a particular way.

Ross Williams. FunnelWeb Tutorial Manual, pg 4.

Density within our heads

One could argue that dense code can still be literate in style. Its just that you have to fit all the programming languages syntax into your head. Its not unrealistic to ask developers to know the syntax/api of their language. Though holding it all in memory when its particularly dense can be challenging.

If your a Clojure programmer you might have a good understanding of this code as documentation:

1
2
3
(def ^{:dynamic true
       :doc "some doc here"}
     *allow-default-prerequisites* false)

And if you’re a Ruby or Perl programming you might read this with ease:

1
$!.is_a?(MonkeyError)

Can dense languages be a good idea?

“The quantity of meaning compressed into a small space by algebraic signs, is another circumstance that facilitates the reasonings we are accustomed to carry on by their aid.”

  • Charles Babbage, quoted in Iverson’s Turing Award Lecture

Is there a trade-off in moving to a more dense form of expression in helping shape the way we think and the kind of thoughts we have?

How easy is it to hold a dense language in our heads, remembering all the syntax in order to easily read code?

Regular Expressions

While regular expressions are not a programming language they are one of the best examples of a very dense language both syntactically and expressively  that has persisted in its syntax through many programming languages.

Is that a sign that regular expressions have succeeded in encoding pattern matching text?

Write Once

Do you understand this pattern?

1
/^[\w]$/

How about we push the complexity level and try some of the more esoteric symbols in regular expressions:

Do you know what this does?

1
/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i

How about this?

Full email detection regular expression (RFC822)

While regular expressions are very well suited to small patterns, with a very dense language our ability to parse complex statements is reduced.

Which has a knock on effect for maintenance, its read-only and even then its not easy to read.

Readability

In fact its considered bad practice to write a regular expression of the form above. Its understood that its hard to read and hence programmers have to add to the dense language to increase readability:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/
^                                             # start of string
(                                             # first group start
  (?:
    (?:[^?+*{}()[\]\\|]+                      # literals and ^, $
     | \\.                                    # escaped characters
     | \[ (?: \^?\\. | \^[^\\] | [^\\^] )     # character classes
          (?: [^\]\\]+ | \\. )* \]
     | \( (?:\?[:=!]|\?<[=!]|\?>)? (?1)?? \)  # parenthesis, with recursive content
     | \(\? (?:R|[+-]?\d+) \)                 # recursive matching
     )
    (?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )?   # quantifiers
  | \|                                        # alternative
  )*                                          # repeat content
)                                             # end first group
$                                             # end of string
/

This is definitely not literate programming, comments and code are clearly separate things.

Named captures groups are also an optional feature to improve and document the readability.

1
2
3
4
5
6
7
8
9
10
user_regexp = %r{
   (?<username> [a-z]+ ){0}

   (?<ip_number> [0-9]{1,3} ){0}
   (?<ip_address> (\g<ip_number>\.){3}\g<ip_number> ){0}

   (?<admin> true | false ){0}

   \g<username>:\g<ip_address>:\g<admin>
 }x

Mnemonics

Our memory also struggles to find mnemonics or associations to remember the full vocabary of regexps:

1
2
3
4
5
6
7
8
9
#Some easy ones
/w #word
/s #space

#Harder ones
(?<!pat)
(?<=pat)
(?!pat)
(?=pat)

Reducing the Density of Regular Expressions

Creating a DSL for parsing text is a huge domain. The power of regular expressions is very clear.

Yet there have been attempts in various languages to move regular expressions towards a more verbose form to improve readability.

Regexp::English

The Perl community has attempted to provide a more English, verbose syntax for regular expressions:

Regexp::English provides an alternate regular expression syntax, one that is slightly more verbose than the standard mechanisms

Lets look at an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
        use Regexp::English;

        my $re = Regexp::English
                -> start_of_line
                -> literal('Flippers')
                -> literal(':')
                -> optional
                        -> whitespace_char
                -> end
                -> remember
                        -> multiple
                                -> digit;

        while (<input />) {
                if (my $match = $re->match($_)) {
                        print "$match\n";
                }
        }

Better?

Loving the Density of Regular Expressions

Clearly there has been some recognition among developers that regexp could be improved by being more verbose. Its interesting that these attempts are considered failures. It would imply the majority of developers prefer dense regexps.

“you can document them with comments, named capture groups, composing them from well-named variables. of course, no one does those things.” Tom Stuart

In the Perl community some people have given up completely on the humans and their dense, hard to maintain regular expressions. They create tools to decode the density automatically:

1
2
3
4
use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new(qr/this.*(?:that)?(?!another)/)
      ->explain;

Outputs:

The regular expression:

(?-imsx:this.*(?:that)?(?!another))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  this                     'this'
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    that                     'that'
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    another                  'another'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Some snippets on developers thoughts about regular expressions:

“Love them, probably because they’re a form of arcane magic and they make me feel special for being able to control them”

“I think they are geek candy … sometimes used to show off maximum geekness”

“Love them because they make me look clever and LIKE A H4XX0rrrrr!”

The Aesthetics of Density

Regular expressions have succeeded and they are one of the few very dense languages to have done so.

The density of regular expressions make the initial barrier to getting started and moving to expert high. They are far from what we imagine in a literate programming style, yet once you have the syntax in your head, movement becomes fluid, you think in regular expressions. Dense languages with a very limited common syntax set allow experimenting rapidly. Without practice dense languages quickly drop from your mind and you struggle to fit the problem into the right expressive form.

There is an undeniable beauty in the density of regular expressions, in both syntax and expression.

Finding prime numbers using a single Regexp:

1
/^1?$|^(11+?)\1+$/

It’s also drink absinthe and cut off your own ear crazy.

I wrote this PROLOG code for my thesis. I have no idea how it works now and it would take me about a month of playing with it to get back to a state where the dense language was back in my head and I could express ideas in the PROLOG way.

I spent over a month adding nothing more than a single “!” mark in the code.

1
2
3
4
5
6
7
abdemo_holds_ats([holds_at(F,T)|Gs],R1,R3,N1,N3,D) :-
     !,
     abdemo([holds_at(F,T)],R1,R2,N1,N2,D),

     %cut added Joseph Wilk 16/03/2004
     !,
     abdemo_holds_ats(Gs,R2,R3,N2,N3,D).

I still feel its some of the most beautiful code I’ve written. Why?

I revel in the expressive density. Bending my brain to express my thoughts in the densely expressive PROLOG way.

I guiltily dip into the syntactical density because it’s like the detailing on the brush strokes of a painting.

Would I ever write this code in a production system that developers would have to maintain? Hell no.

Would I consider this literate programming? Hell no. Just look at the 100 of lines of comments.

But for the sake of art and realising a form of flow I’ve not encountered since, I would happily revel in the aesthetics of density.

Michael Wolf “Architecture of Density no.36”: http://www.flickr.com/photos/worldeconomicforum/6751247749/

Comments