Who owns culture?

At the recent Webby awards, Steve Wilhite, the primary creator of the GIF image format, showed gratitude for his award by berating the public for not pronouncing “GIF” the way he does. A pretty silly thing to be upset about, as is my being upset with his pique. But it is one more example of a larger problem: who owns and controls our culture?

Culture—things like language, music, art, food, social customs and rituals—is a creation of the human mind, and often individual pieces of it are created by individual people. Those people certainly deserve credit when their creations become popular. Our government even has laws like copyright intended to encourage the creation of certain kinds of culture by giving their creators a limited commercial monopoly on them. But many people mistakenly take that as support for the idea that creators "own" their creations in some way, and have an absolute right to control how they are used.

A culture cannot possibly grow with such a crippling restriction. Thankfully, many things like mathematical and scientific discoveries, food recipes, and athletic techniques are not subject to such monopolies or our culture would grind to halt completely. Culture depends on a thriving public domain. The “public domain” is the art, music, literature of a culture that is not owned or controlled, like the music of Bach and Mozart, the works of Shakespeare and Dickens, the art of Michaelangelo, the inventions of Archimedes. These artists lived in a world where copyright and patent did not exist, but even after these were created, their limited terms ensured that eventually all works would pass into the public domain and enrich our culture by allowing homage, parody, remixing, and other creative uses that the original creator might never have imagined.

Patents are still, thankfully, limited, so technology can still progress by building on the past. But congress has over the years extended the term of copyright to ludicrous lengths, passing a new copyright law coincidentally every time Walt Disney's “Steamboat Willy” cartoon of 1928 is in danger of slipping into the public domain. Walt Disney himself died in 1966, so copyrights aren't really encouraging him to create more—at this point they're basically corporate welfare for Disney, a company that has made much of its fortune exploiting fairy tales and other public-domain stories. Woe unto the artist who wants to use the image of Mickey Mouse in any way not approved of by Disney: lawyers will descend, ensuring that no one is allowed to enrich our culture in this way unless they also enrich Bob Iger and company.

What could be more a part of American culture than singing “Happy Birthday”? Well, if you do that in a restaurant, or in a movie, be prepared for Time Warner to vigorously defend the rights of its creators, Patty and Mildred Hill, who wrote it in 1893 and are both long dead. There is considerable legal doubt as to whether this copyright claim is legally sound, but there is no question that people have been and continue to be sued over it.

But back to GIF. Wilhite created the original format in 1987 as a way to transmit images over the CompuServe network. The original version wasn't quite up to the job, so a group of graphics programmers on CompuServe (including me, CIS 73407,2030) convened to update it. (For a real walk down memory lane, check out this paper I wrote at the time. It's a detailed explanation of a grapics technique written in plain text, before GIF, before the Web). We produced a specification for GIF 89a, which is what has been used since. After the specification was complete, the powers that be at CompuServe decided to add a paragraph declaring that the acronym should be pronounced with a soft G, thereby confusing pictures with peanut butter. Even at the time, this was a bone of contention. I and many other people who had already been using GIF for some time had always pronounced it with the hard G—after all, it stands for “graphics”. But our objections were ignored.

That's fine—I don't really care how you pronounce it. But I do care that CompuServe, and Wilhite, think they have some right to tell you and me how we should pronounce it. It's a word. It's part of our language. It's in many dictionaries now. And all of those dictionaries—absolutely correctly—include both pronunciations. Because that's how people pronounce the word, and words belong to the people who use them, not to the people who created them.

So stand up to corporate hijacking of our language. Sing Happy Birthday in a restaurant. Call your company a Mickey Mouse operation. Xerox something on your Canon photocopier. And trade GIFs on the net, pronouncing them any way you like. It's our language, our culture, not theirs.

Representing playing cards in software

There are several different ways to represent playing cards in software, each with its own benefits, drawbacks, and best application. I want to outline these, and explain why I chose the particular integer representation used in the OneJoker card library.

A standard deck from Copag, popular in casino poker rooms.

A standard deck from Copag, popular in casino poker rooms.

What is a card?

The set of cards in a standard Anglo-American deck is the cartesian product of two sets: 13 ranks and 4 “French” suits, each card having one of each. Operations on cards typically involve comparing the ranks of two cards based on an ordering dependent on the game, and comparing suits for equality. Many games also use one or more jokers, which have neither rank nor suit. Decks of cards today are manufactured with two jokers, one of which is typically printed in black only, and the other in color (or distinguished in some similar way). Games that distinguish between these often call them “red” and “black” by analogy to suited cards.

Whatever representation is used, it is useful to be able to get the rank or suit of a card as a small integer that can be used to index lookup tables and compute sums. Direct comparison of ranks and identifying sequences are also common, but ordering varies from one game to another, so this should be done carefully. If an application is designed for one game, choosing a representation suited to that game will be handy. For example, poker applications should choose a rank that gives the lowest number to deuces and the highest number to aces so that ranks can compare directly.

Text

While interactive games will display cards to the user as graphical images and accept input from a mouse, other applications that use playing cards must at some point acquire input and produce output as text for humans. A common and effective method is to use one digit or letter for the card's rank and another for the suit: 2c, 9h, Qd, As, etc. The letter T is usually used for tens to keep these strings uniform. This is a good way to save card information in text files, to communicate them over network protocols, and so on. It is common to use JK to represent the joker. It is not common at all to distinguish between the red and black jokers, though some games require it. I recommend using JR for the red joker when the distinction matters, and JK for the black (or when the distinction doesn't matter).

In the spirit of the networking axiom “Be conservative in what you produce, liberal in what you accept”, I recommend that cards be consistently written in this two-character format, uppercase rank and lowercase suit, with a space between cards when representing a list or set. When reading such a list, one can be more liberal by accepting case differences, extra whitespace, no whitespace, or even 10 if uniformity is not required. If such text is for human consumption only (such as running text on a web page or printed book not likely to ever be read by a program), one might use the Unicode suit symbols (♣ ♦ ♥ ♠) as well as red and black text, but these are awkward for use in 8-bit data formats.

Using such a text representation of cards internally for code that runs a game or simulation is always a bad idea. There is no programming language or application I know of for which such an internal representation does not lead to loss of performance and excessive memory usage. Converting other representations to strings for output is always trivial and fast. Converting from input strings may be a tiny bit harder, but it is still simple, and even programs using string representations will have these same complications dealing with irregular inputs and such. So it is always better to represent cards internally with a different representation and convert them for input and output as needed.

Objects

In object-oriented languages, using objects to represent cards is reasonably efficient for most uses. Operations on cards often involve comparing ranks and suits separately, so the card object should have two member variables for rank and suit. Rank should be an integer or an integer-like class (such as an enumeration class) that can do ordered comparisons. Suits are generally only compared for equality, so they can be integers, enumerations, or pointers to one of four static suit objects. Identifying a card object as a joker can be done with an additional flag, or else it can be assigned a unique rank.

Such a representation is fairly compact, so it will not cause excessive memory use. It should be pointed out, though, that even object-oriented languages typically have efficient “primitive” types like integers, and so it might make sense for some applications to forgo objects in favor of one of the integer representations below for extra performance. One might still have a card class with static functions that operate on these integers for clarity. A good example is the Pokerstove application in C++ which uses Card, Rank, and Suit objects for I/O and some functions, but computes different internal representations from them when needed for performance.

If you want to keep extra information in the card object, you can avoid the cost of copying larger objects by keeping a single collection of 52 static card objects and using pointers to these as the cards that get manipulated at runtime.

Bitmaps

If the card games being simulated involve sets of cards with no duplicates, and for which the order of cards in a set is not important, one can represent a set of cards as a single 64-bit integer in which each bit indicates the presence or absence of one particular card in the set. In addition to being the most compact representation for sets, this can speed up many complex calculations. If the bit positions are chosen so that each 16-bit subset of the value repesents one suit, and 13 of each of those 16 bits is the rank, then the 16-bit sub-integers can be used directly for comparisons as well, speeding up calculations further.

As noted, this does not preserve the order of cards, so if you want to do something like shuffle a deck, you'll have to represent the deck as an array of these masks, with each member having one bit set, and then OR them together into a hand as they are dealt. This may be slower than dealing with arrays of machine-size integers. This representation also makes using lookup tables indexed by rank or suit difficult. Also, since no duplciates are allowed, this method cannot be used for games that require duplicate cards such as Pinochle and Canasta.

This representation is most useful for single-purpose applications doing very complex calculations on fixed-sized sets of cards. The venerable pokersource library uses bitmaps to evaluate poker hands, and it is quite effective.

Bitfields

Because the typical 32-bit integer size of most machines is much larger than necessary to identify a card, we can use groups of bits within an integer to store information about the card. Specifically, two bits for suit, four for rank, and the rest for flags or anything else the application might need. This is similar to treating an integer as an object with member variables stored in a very compact way. The well-known Suffecool/Senzee poker hand evaluator uses this method to store along with each card one of 13 prime numbers used in its calculations.

This gives us some of the advantages of the object representation while being more compact. This speeds up applications that need to move and copy many cards from place to place, such as blackjack simulations. A blackjack simulation might use 4 bits to store the 1 to 10 numerical value of a card to avoid some branching in the innermost loop that computes a hand value (though you'll still need to deal with aces specially). Getting ranks and suits out of our numbers requires only fast bit-masking operations to get numbers suitable for indexing lookup tables.

Integers

Finally, there is what is probably the simplest representation of all, but no less powerful if done correctly: simply assigning a small integer value to each card. One can see software in which cards are ordered the way they are when you open a typical new deck of cards, which is Ac, 2c, 3c, ... Kc, Ad, 2d, ... Qs, Ks. This is a bad idea for two reasons. First, getting a numerical rank and suit from a number requires an expensive division by 13, and even after that aces will usually have to be special-cased to move them to their usual high rank.

Better is to order the cards in the standard poker “high card by suit” ordering, which is 2c, 2d, 2h, 2s, 3c, 3d, ... Ks, Ac, Ad, Ah, As. This has many advantages. First, you can separate rank and suit with fast bit masking (in fact, this ordering is essentially a bitfield representation with suit as the low order bits). Also, one can often compare or sort cards by rank without even separating the ranks just by comparing the values themselves. Likewise, comparing ranges of ranks can be done by comparing ranges of values (the “10 count” cards in blackjack, for example, are the range 32 to 47).

This representation is ideal for indexing lookup tables. The values that one might store in a bitfield or object, for example, can simply be fetched from a small lookup table with almost no performance hit. Sets of cards (hands, decks, discard piles, etc.) are simply arrays of integers, for which many programming languages are highly optimized. Duplicate values are no problem, so games like Pinochle and things like 6-deck blackjack shoes need no special handling.

The OneJoker card library uses this represention with a minor change: I add one, so cards have the values 1 to 52 rather than 0 to 51 (the values 53 and 54 are used for jokers). The need for an occasional -1 is not a significant performance hit, it can often be avoided entirely by adding one element to lookup tables, and being able to use 0 as a “null” value is very handy in the C language.

While any one particular application might be faster with a different representation, this simple one is very fast for the vast majority of applications, and can be easily converted to others when needed, so it is probably ideal for a general-purpose library.