Cover to cover (to cover to cover to cover…)

Tonight I finished a project I’ve been working on for years: reading all of the Harry Potter stories to my family, all seven volumes, all 4,167 pages.

Tonight I finished a project I’ve been working on for years: reading all of the Harry Potter stories to my family, all seven volumes, all 4,167 pages. Correction: I read almost all of them. Jonah took over for one chapter a little while ago when my voice was too hoarse, like Sam to my Frodo.

I am so, so happy that I prohibited the boys from seeing any of the movies before reading the books; and I feel so, so sorry for anyone who’s only ever seen the movies and thinks they know the stories. Naturally there are a lot of great scenes and there is a lot of rich detail that never made it into the movies, but that is true with any novel adapted for the screen. Also as with any adaptation, some of the changes made by the filmmakers were improvements on those unavoidable parts of the books that were not great or rich.

But the films do worse than leave out scenes and detail — they get key characters wrong. (Or at least, numbers 1 through 6 do. None of us has yet seen parts 7a or 7b.) The film Dumbledore may be a powerful wizard, but he has none of the book Dumbledore’s warmth, deep wisdom, or mischief. And Harry Potter may stir himself to heroic action on film when the need arises, but there is no sense of the fierceness of purpose to which he is so often roused in the books.

I’m sure countless Harry Potter fan sites have debated these things endlessly so I won’t belabor them, but I can’t let it go before giving one example to illustrate my complaint. Spoilers ahead.

Continue reading “Cover to cover (to cover to cover to cover…)”

Staring contest

Here in a nutshell is why Steven Spielberg was the wrong choice for bringing Stanley Kubrick’s unfinished project A.I. to the big screen.

Here in a nutshell is why Steven Spielberg was the wrong choice for bringing Stanley Kubrick’s unfinished project A.I. to the big screen.

A Google image search for “Kubrick stare” turns up:

A Google image search for “Spielberg stare” turns up:

D

This is the first sentence of the five hundredth post on gee bobg. This sentence was quick, but not quick enough to be the first sentence. This sentence is content merely to be in the first paragraph.

This is the first sentence of the five hundredth post on gee bobg. This sentence was quick, but not quick enough to be the first sentence. This sentence is content merely to be in the first paragraph.

This sentence starts out wondering why the title of this post is “D,” but ends by remembering that D is the Roman numeral for 500. This sentence asks why the Roman numeral for 500 is D. This sentence doesn’t know but guesses it has something to do with the Latin root “demi,” which means half, as in half a thousand.

There was a sentence before this one, but it went off to Wikipedia to check out the “demi” hypothesis.

This is what the Bob-o-matic has to say on the occasion of the five hundredth post on gee bobg:

It’s weird when you organize government from another family that has the Bush west wing. Leia’s not Luke’s sister, but a few times to use? You bet! What’s to insult the girls again, finally? The only time in 1979. Hey, I know that slapstick is dead. Now an annual nationwide two-week sale. The star-spangled banner: not a multiple of one adventure. And eventually made my way to go ahead. And production values! But to have some green figs, yogurt, and more experienced for her, and why? To make your selection, Sir Topham Hatt, or you’re caught. Reindeer that fly? Or would I have been Matty? Who would even find such a film that I never saw? Even one drop was too easy to be confused with Dr. No Kidding. Clicking the hang of it. There it is: the first several days. I was the first. Votes, in the meantime…

This sentence points out that the occasion of the five hundredth post on gee bobg comes within just a couple of weeks of the fifth anniversary of the first post. This sentence asserts it is the last sentence of the five hundredth post on gee bobg and reminds you not to regard the following parenthetical remark as part of the five hundredth post’s main content. (Tip of the hat to David Moser’s “This Is the Title of This Story, Which Is Also Found Several Times in the Story Itself.”)

How (and why) to program, part 1

This entry is part 1 of 2 in the series How (and why) to program

This puzzle is a ripe one for solving (some might say cheating at…) by computer. All we need is a list of four-letter words and a way to test every combination of words in the grid for validity; so let’s get to it.

This entry is part 1 of 2 in the series How (and why) to program

On May 15th, listeners to the NPR program Weekend Edition were given this challenge by puzzlemaster Will Shortz:

Create a 4-by-4 crossword square with four four-letter words reading across and four different four-letter words reading down. Use the word “nags” at 1 across and the word “newt” at 1 down. All eight words must be common, uncapitalized words, and all 16 letters must be different.

Here is the starting grid as described by Will Shortz:

1 N 2 A 3 G 4 S
5 E
6 W
7 T

This puzzle is a ripe one for solving (some might say cheating at…) by computer. All we need is a list of four-letter words and a way to test every combination of words in the grid for validity; so let’s get to it. For this example I will use the Python programming language for its conciseness and readability.

First, the list of words. On Linux and Mac OS X (and most other Unix-based operating systems) there is a file called /usr/share/dict/words, or sometimes /usr/lib/dict/words or /usr/dict/words, which is nothing but a more-or-less comprehensive list of English words, one per line. We’ll read that file to get our list of four-letter words, discarding words that are shorter or longer. In fact, since we know that every word in the grid begins with the letters in NEWT and NAGS — namely, N, E, W, T, A, G, and S — we can even discard four-letter words not beginning with one of those seven letters. And we can discard N words too, because NEWT and NAGS are already given; we don’t need to search for N words. The words we keep will be stored in six different “sets,” one for each of the six remaining starting letters.

Here’s the first section of our code: creating our six empty word-sets and giving each one a name.

a_words = set()
g_words = set()
s_words = set()
e_words = set()
w_words = set()
t_words = set()

Now to get the words we want out of the “words” file. First we “open” the file to get a special placeholder we’ll call wordlist.

wordlist = open('/usr/share/dict/words')

We’re going to read one line of the file at a time. The placeholder remembers our position in the file in between reads.

To read one line of the file at a time and do something with it, we’ll write a “loop” that begins like this:

for word in wordlist:
  ...stuff...

This will do stuff once for each line of the file, with word referring to the contents of that line. Unfortunately the first line of that stuff is a necessary but confusing bookkeeping detail:

for word in wordlist:
  word = word[:-1]
  ...more stuff...

This confusing bit of code is here because each line of the file — each line of every file, in fact — ends with an invisible “newline” character that means “this line is over, start a new one.” Since we want to deal only with the visible content of the line while doing more stuff, we need to discard that newline character, and

word = word[:-1]

is how you do that. (For our purposes right now it’s not terribly important how that works, but you can read it roughly as, “replace word with everything-in-word-except-the-last-character.”)

We’ll begin more stuff with a test to make sure that the word we’re looking at is four letters long:

for word in wordlist:
  word = word[:-1]
  if len(word) == 4:
    ...the rest of the stuff...

Here, len(word) means “the length of word” (i.e., the number of characters it contains), and == is for testing whether two things are equal. (A single = means “make this equal that,” as we’ve already seen a few times.) The rest of the stuff will only happen if len(word) is 4.

If it is 4, then we want to save the word in the correct set — either the set of A words, or the set of G words, etc. Here’s what the complete loop looks like:

for word in wordlist:
  word = word[:-1]
  if len(word) == 4:
    first_letter = word[0]
    if first_letter == 'a':
      a_words.add(word)
    elif first_letter == 'g':
      g_words.add(word)
    elif first_letter == 's':
      s_words.add(word)
    elif first_letter == 'e':
      e_words.add(word)
    elif first_letter == 'w':
      w_words.add(word)
    elif first_letter == 't':
      t_words.add(word)

After determining that len(word) was 4, we gave the name first_letter to the first letter of word (which is written as word[0], because most things in programming are counted beginning at 0 rather than at 1). We then tested first_letter to see if it was a, and added it to a_words if it was; if it wasn’t, we tested it to see if it was g, and added it to g_words if so; etc. The strange word elif appearing in the code above is simply Python’s abbreviation for “else if.”

If first_letter isn’t one of the six we care about — or if word isn’t 4 characters long to begin with — then nothing happens and we loop back around to the for word in wordlist line to get the next word.

After reading all the lines from wordlist, the loop exits and the program does whatever comes next, which is:

wordlist.close()

This is the way to say we’re finished using that placeholder and don’t need it anymore. Doing so releases resources, such as memory space, that the computer can now use for other purposes. (In a little program like this, that doesn’t matter much, so it’s OK to leave out wordlist.close(). But in large programs you can “leak” memory and other resources if you do things like fail to close files when you’re finished with them.)

OK. We’ve got our lists of candidate A words, G words, S words, E words, W words, and T words. Now the strategy will be to try every possible combination of A, G, and S words for 2, 3, and 4 Down; and then, for each of those possible combinations, check that the resulting words in 5, 6, and 7 Across make sense; and additionally that no letter is repeated anywhere in the grid.

To try every combination of A, G, and S words, first we start by trying every A word:

for a_word in a_words:
  ...stuff...

As we’ve already seen, this is a loop that will do stuff once for each entry in a_words (with a_word referring to that entry each time through the loop). Now if we nest another loop inside this loop, like so:

for a_word in a_words:
  for g_word in g_words:
    ...more stuff...

then more stuff will happen once for each possible combination of a_word and g_word. That is, first a_word will be aced (let’s say), and while a_word is aced, g_word will range from gabs to gyro; and when the g_words loop is finished, a_word will advance to aces, and g_word will again range from gabs to gyro, and so on though all the four-letter A words, each one running through all of the four-letter G words, like the digits of an odometer.

From that you should be able to guess that we need to nest another loop inside our nested loop, this one for the S words.

for a_word in a_words:
  for g_word in g_words:
    for s_word in s_words:
      ...test this combination...

Now to test this combination once for each possible combination of A word, G word, and S word. The first test is to see whether the words created by placing a_word in 2 Down, g_word in 3 Down, and s_word in 4 Down result in sensible words at 5 Across, 6 Across, and 7 Across.

Let’s construct the word at 5 Across — the E word — from the second letters of a_word, g_word, and s_word.

e_word = 'e' + a_word[1] + g_word[1] + s_word[1]

(Remember that counting the letters in a word begins at 0, so the second letter of each word is numbered 1.)

Let’s construct the W word and the T word the same way — w_word from the third letter of each Down word, and t_word from each Down word’s fourth letter.

w_word = 'w' + a_word[2] + g_word[2] + s_word[2]
t_word = 't' + a_word[3] + g_word[3] + s_word[3]

Now if a_word is ammo and g_word is gulp and s_word is shot, then e_word will be emuh, w_word will be wmlo, and t_word will be topt, which don’t make sense! But we can easily check whether the Across words make sense by seeing if they can be found in the e_words, w_words, and t_words sets that we created earlier. So:

for a_word in a_words:
  for g_word in g_words:
    for s_word in s_words:
      e_word = 'e' + a_word[1] + g_word[1] + s_word[1]
      w_word = 'w' + a_word[2] + g_word[2] + s_word[2]
      t_word = 't' + a_word[3] + g_word[3] + s_word[3]
      if e_word in e_words and w_word in w_words and t_word in t_words:
        ...remaining test...

If we get as far as the remaining test (which is to ensure that no letter is duplicated anywhere in the grid), we know that e_word and w_word and t_word are real words.

We can ensure no letter is duplicated by using another set called letters_used. The plan: go through the letters of each Down word one by one, checking whether the letter is in the set. If it’s not, then add it to the set and move to the next letter. If it is, then we’ve already seen that letter once and it’s duplicated.

We know the first Down word is newt, so we can create our set with those letters in it.

letters_used = set(['n', 'e', 'w', 't'])

(We could also have created this set like this:

letters_used = set()
letters_used.add('n')
letters_used.add('e')
letters_used.add('w')
letters_used.add('t')

which matches the way we created and used the other sets above, but the first way does the same thing more concisely.)

Now to use that set to find duplicates.

any_duplicates = False
for letter in a_word + g_word + s_word:
  if letter in letters_used:
    any_duplicates = True
    break
  else:
    letters_used.add(letter)

Here, break means “leave the loop immediately.” There’s no need to keep looping once we know there are duplicates.

By the time the loop finishes, either we’ve looped through all the letters and found no duplicates, or we exited the loop with break because we did find duplicates. We check any_duplicates to see which of those two things happened. If it’s True there were duplicates; but if it’s False then we’ve found a valid solution and can print it to display it to the user.

if any_duplicates == False:
  print a_word, g_word, s_word, e_word, w_word, t_word

To recap, here’s the complete program.

a_words = set()
g_words = set()
s_words = set()
e_words = set()
w_words = set()
t_words = set()

wordlist = open('/usr/share/dict/words')

for word in wordlist:
  word = word[:-1]
  if len(word) == 4:
    first_letter = word[0]
    if first_letter == 'a':
      a_words.add(word)
    elif first_letter == 'g':
      g_words.add(word)
    elif first_letter == 's':
      s_words.add(word)
    elif first_letter == 'e':
      e_words.add(word)
    elif first_letter == 'w':
      w_words.add(word)
    elif first_letter == 't':
      t_words.add(word)

wordlist.close()

for a_word in a_words:
  for g_word in g_words:
    for s_word in s_words:
      e_word = 'e' + a_word[1] + g_word[1] + s_word[1]
      w_word = 'w' + a_word[2] + g_word[2] + s_word[2]
      t_word = 't' + a_word[3] + g_word[3] + s_word[3]
      if e_word in e_words and w_word in w_words and t_word in t_words:
        letters_used = set(['n', 'e', 'w', 't'])
        any_duplicates = False
        for letter in a_word + g_word + s_word:
          if letter in letters_used:
            any_duplicates = True
            break
          else:
            letters_used.add(letter)
        if any_duplicates == False:
          print a_word, g_word, s_word, e_word, w_word, t_word

Put this program in a file named puzzle.py and run it with python puzzle.py. On my computer, /usr/share/dict/words contains 470 four-letter words starting with a, 359 starting with g, and 634 starting with s, and it took about two minutes and a quarter for this program to test all of the 470×359×634 combinations. (We could make this program much faster, but at the expense of clarity and simplicity.) In the end it produced this solution:

achy grip sumo ecru whim typo

which is the same solution that Will Shortz gave on the air a week later.

(Actually, my computer produced two solutions, because /usr/share/dict/words includes a lot of questionable junk in its quest to be comprehensive. The other solution was “achy goup sld. ecol whud typ.”)

Update: As a couple of friends have pointed out, the Mac OS X version of /usr/share/dict/words does not include all the words needed to find the solution! I ran mine on Linux, which does.

Counterintuitive counting

You know those greedy media companies who are so afraid of relaxing control over every aspect of a book’s/song’s/movie’s commercial life that they crush innovators trying to add value, sue the fans on whom their livelihood depends, subvert public policy, poison international treaties, and spend more money on lawyers and lobbyists than on, you know, actually producing more of the content they guard so jealously?

I have a data point for them.

Fourteen years ago I wrote a book called Writing GNU Emacs Extensions. It was a technical book on an obscure topic, with a decidedly limited audience. I had some modest income from it for a few years, then the royalties petered out to less than nothing — not surprising, since it had become very outdated with respect to the latest versions of Emacs, the technology it described.

Around that time, eBooks were starting to get big. My publisher, O’Reilly, asked whether I would consent to making my book available online, at a lower sale price, in a format that would all but ensure the spread of pirated copies. Why not, I thought, and agreed.

Sure enough, pirated copies of my book now exist on the web, but something else happened — sales began to increase again. My royalties went from a low point of negative $1.71 for 3q2006 to a whopping $78 in the check I received yesterday! …OK, $78 isn’t exactly whopping, but it’s positive — much more so than -1.71 — and represents a sales volume for my old, outdated, limited-appeal book that O’Reilly and I haven’t seen in probably ten years.

So greedy media companies, relax your intellectual property sphincters, let the content flow, and enjoy the multiplier effect that comes from making your titles more available instead of less. For as Princess Leia™ warned, “The more you tighten your grip, Tarkin,™ the more star systems will slip through your fingers.” (® and © 1977 Lucasfilm Ltd. and 20th Century Fox.)

What a decade and a quarter can do

Halloween, 1998: I take a weekend trip to San Diego with my girlfriend, Andrea, and a few other friends. Money’s tight, in part because I haven’t drawn a salary from my struggling startup company for over two years, but that’s OK: our friends, most of whom are graduate students, are all broke too, so we crash in the living room of a couple we’ve come to see. We visit the San Diego Zoo, where I meet and feed a baby giraffe. My friend Paul captures the event (and much else from the weekend) on an amazing new device: an SLR camera body that has been partially hollowed out and fitted with a digital sensor and a small LCD display on the back (since the prism is gone and the viewfinder no longer works). It’s borrowed from the university where he works, which custom-built it for about ten thousand dollars. He shares the resulting digital photos with the rest of us by putting them on his department’s web server, but only temporarily because they take up so much disk space that he has to delete them after a few weeks.

1.25 decades later: Andrea is now my wife. We visit San Diego again. Thanks in part to income from my startup company, we have the means to stay in a hotel, and not only to visit the San Diego Zoo but to spring for their Safari Park’s “Roar and Snore” overnight camping experience. For his part, Paul is now an Academy-Award-winning computer-graphics researcher. Digital cameras on a par with his custom-built experimental rig from 1998 can now be had for around a hundred bucks and are so ubiquitous that they’ve all but killed the consumer film business. Disk space, likewise, is cheap enough that the company I work for has made a lucrative business out of giving away essentially unlimited amounts of it for free. I meet and feed not one giraffe, but two…

…and two amazing people who didn’t even exist 1.25 decades ago feed some giraffes too.

Anything I can do they can do better

This morning my sons will complete a five-week course in cartooning that they’ve been taking at the San Francisco studio of the awesome Sirron Norris.

I took some Saturday-morning cartooning classes when I was around their age, and now that they have too, there is nothing left that I could do as a kid that they can’t. But there’s plenty that they can do that I can’t, even as an adult.

My kids Me as a kid Me as an adult
Cartooning yes yes yes
Swimming yes yes yes
Bicycling yes yes yes
Succeeding at school/work yes yes yes
Inexhaustible capacity for enjoying Star Wars and Lego yes yes no
Playing baseball yes no kinda
Playing a musical instrument yes no1 kinda
Martial arts yes no no
Choral singing yes no no
Gymnastics yes no no
Skiing yes no no

1“Hot Cross Buns” on the recorder doesn’t count.

It’s like they’re not ordinary kids. They’re… superkids!

Over their darnedest heads

Jonah and Archer, ages 9 and 7, recently had occasion to watch Bill and Ted’s Excellent Adventure. They thought it was hilarious and quoted it around the house for a few days.

A week or so after they saw it there was this conversation between them:

Archer: What was Bill and Ted’s favorite number again?
Jonah [authoritatively]: Sixty-eight.
Archer [puzzled]: Sixty-eight?
Jonah: Yep. Sixty-eight. …It might have been seventy-eight.
Archer: I don’t get what’s funny about that.
Jonah: Me neither. But it was definitely either seventy-eight or sixty-eight.