# Information theory

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## Macy Conferences on Cybernetics

During the foundational era of cybernetics, Norbert Wiener, John von Neumann, Claude Shannon, Warren McCulloch, and dozens of other distinguished researchers met at annual conferences sponsored by the Josiah Macy Foundation to formulate the central concepts that, in their high expectations, would coalesce into a theory of communication and control applying equally to animals, humans, and machines. Retrospectively called the Macy Conferences on Cybernetics, these meetings, held from 1943 to 1954, were instrumental in forging a new paradigm. To succeed, they needed a theory of information (Shannon's bailiwick), a model of neural functioning that showed how neurons worked as information-processing systems (McCulloch's lifework), computers that processed binary code and that could conceivably reproduce themselves, thus reinforcing the analogy with biolOgical systems (von Neumann's specialty), and a visionary who could articulate the larger implications of the cybernetic paradigm and make clear its cosmic significance (Wiener's contribution). The result of this enterprise was nothing less than a new way of looking at human beings. Henceforth, humans were to be seen primarily as information-processing entities who are essentially similar to intelligent machines.[1]

## Information theory

Claude Shannon, a theorist working at Bell Laboratories, defined a mathematical quantity he called information and proved several important theorems concerning it in his classic paper "A Mathematical Theory of Communication" published in the Bell System Technical Journal in July and October 1948.

Shannon's theory defines information as a probability function with no dimensions, no materiality, and no necessary connection with meaning. It is a pattern, not a presence. The theory makes a strong distinction between message and signal. Lacan to the contrary, a message does not always arrive at its destination. In information theoretic terms, no message is ever sent. What is sent is a signal. Only when the message is encoded in a signal for transmission through a medium--for example, when ink is printed on paper or when electrical pulses are sent racing along telegraph wires--does it assume material form. The very definition of "information," then, encodes the distinction between materiality and information that was also becoming important in molecular biology during this period.[2]

Why did Shannon define information as a pattern? The transcripts of the Macy Conferences indicate that the choice was driven by the twin engines of reliable quantification and theoretical generality.[3]

Compared to other definitions, Shannon's approach had advantages that turned out to incur large (and mounting) costs when his premise interacted with certain predispositions already at work within the culture. Abstracting information from a material base meant that information could become free-floating, unaffected by changes in context. The technical leverage this move gained was considerable, for by formalizing information into a mathematical function, Shannon was able to develop theorems, powerful in their generality, that hold true regardless of the medium in which the information is instantiated.[4]

## The meaning(lessness) of information

The triumph of information over materiality was a major theme at the first Macy Conference. John von Neumann and Norbert Wiener led the way by making clear that the important entity in the man-machine equation was information, not energy. Although energy considerations are not entirely absent (von Neumann discussed at length the problems involved in dissipating the heat generated from vacuum tubes), the thermodynamics of heat was incidental. Central was how much information could flow through the system and how quickly it could move. Wiener, emphasizing the movement from energy to information, made the point explicitly: "The fundamental idea is the message .. and the fundamental element of the message is the decision." Decisions are important not because they produce material goods but because they produce information. Control information, and power follows.[5]

But what counts as information? Shannon defined information as a probability function with no dimensions, no materiality, and no necessary connection with meaning. Like Shannon, Wiener thought of information as representing a choice. More specifically, it represents a choice of one message from among a range of possible messages. Suppose there are thirty-two horses in a race, and we want to bet on Number 3. The bookie suspects the police have tapped his telephone, so he has arranged for his clients to use a code. He studied communication theory, and he knows that any message can be communicated through a binary code. When we call up, his voice program asks if the number falls in the range of 1 to 16. !fit does, we punch the number "1"; if not, the number "0." We use this same code when the voice program asks if the number falls in the range of 1 to 8, then the range of 1 to 4, and next the range of 1 to 2. Now the program knows that the number must be either 3 or 4, so it says, "If 3, press 1; if 4, press 0," and a final tap communicates the number. Using these binary divisions, we need five responses to communicate our choice.[6]

<wikitex>How does this simple decision process translate into information? First let us generalize our result. Probability theory states that the number of binary choices $C$ necessary to uniquely identify an element from a set with $n$ elements can be calculated as follows:

$C = \log_2 n$

In our case,

$C = \log_2 32 = 5$,

the five choices we made to convey our desired selection. (Hereafter, to simplify the notation, consider all logarithms taken to base 2). Working from this formula, Wiener defined information $I$ as the log of the number n of elements in the message set.

$I = \log n$

This formula gives I when the elements are equally likely. Usually this is not the case; in English, for example, the letter "e" is far more likely to occur than "z." For the more general situation, when the elements $s_1, s_2, s_3, ... s_n$ are not equally likely, and $p(s)$ is the probability that the element $s$ will be chosen,

$I(s_i) = \log 1/p(s_i) = -\log p(s_1)$

This is the general formula for information communicated by a specific event, in our case the call to the bookie. Because electrical engineers must design circuits to handle a variety of messages, they are less interested in specific events than they are in the average amount of information from a source, for example, the average of all the different messages that a client might communicate about the horse race. This more complex case is represented by the following formula:

$I = -\sum p(s_i)[\log p(s_i)]$,

where $p(s_i)$ is the probability that the message element $s_i$ will be selected from a message set with n elements (E indicates the sum of terms as i varies from 1 to $n$).[7] </wikitex>

We are now in a position to understand the deeper implications of information as it was theorized by Wiener and Shannon. Note that the theory is formulated entirely without reference to what information means. Only the probabilities of message elements enter into the equations. Why divorce information from meaning? Shannon and Wiener wanted information to have a stable value as it moved from one context to another. If it was tied to meaning, it would potentially have to change values every time it was embedded in a new context, because context affects meaning. Suppose, for example, you are in a windowless office and call to ask about the weather. "It's raining," I say. On the other hand, if we are both standing on a street comer, being drenched by a downpour, this same response would have a very different meaning. In the first case, I am telling you something you don't know; in the second, I am being ironic (or perhaps moronic). An information concept that ties information to meaning would have to yield two different values for the two circumstances, even though the message ("It's raining") is the same.[8]

To cut through this Gordian knot, Shannon and Wiener defined information so that it would be calculated as the same value regardless of the contexts in which it was embedded, which is to say, they divorced it from meaning. In context, this was an appropriate and sensible decision. Taken out of context, the definition allowed information to be conceptualized as if it were an entity that can How unchanged between different material substrates. A simplification necessitated by engineering considerations may become an ideology in which a reified concept of information is treated as if it were fully commensurate with the complexities of human thought.[9]

Shannon himself was meticulously careful about how he applied information theory, repeatedly stressing that information theory concerned only the efficient transmission of messages through communication channels, not what those messages mean. Although others were quick to impute larger linguistic and social implications to the theory, he resisted these attempts. Responding to a presentation by Alex Bavelas on group communication at the eighth Macy Conference, he cautioned that he did not see "too close a connection between the notion of information as we use it in communication engineering and what you are doing here ... the problem here is not so much finding the best encoding of symbols ... but, rather, the determination of the semantic question of what to send and to whom to send it."5 For Shannon, defining information as a probability function was a strategic choice that enabled him to bracket semantics. He did not want to get involved in having to consider the receiver's mindset as part of the communication system. He felt so strongly on this point that he suggested Bavelas distinguish between information in a channel and information in a human mind by characterizing the latter through "subjective probabilities," although how these were to be defined and calculated was by no means clear.[10]

### Donald MacKay's information theory

Not everyone agreed that it was a good idea to decontextualize information. At the same time that Shannon and Wiener were forging what information would mean in a U.S. context, Donald MacKay, a British researcher, was trying to formulate an information theory that would take meaning into account. At the seventh conference, he presented his ideas to the Macy group. The difference between his view and Shannon's can be seen in the way he bridled at Shannon's suggestion about "subjective probabilities." In the rhetoric of the Macy Conferences, "objective" was associated with being scientific, whereas "subjective" was a code word implying that one had fallen into a morass of unquantifiable feelings that might be magnificent but were certainly not science. MacKay's first move was to rescue information that affected the receiver's mindset from the "subjective" label. He proposed that both Shannon and Bavelas were concerned with what he called "selective information," that is, information calculated by considering the selection of message elements from a set. But selective information alone is not euough; also required is another kind of information that he called "structural." Structural information indicates how selective information is to be understood; it is a message about how to interpret a message--that is, it is a metacommunication.[11]

To illustrate, say I launch into a joke and it falls Bat. In that case, I may resort to telling my interlocutor, "That's a joke." The information content of this message, considered as selective information (measured in "metrons"), is calculated with probability functions similar to those used in the Shannon-Wiener theory. In addition, my metacomment also carries structural information (measured in "logons"), for it indicates that the preceding message has one kind of structure rather than another (a joke instead of a serious statement). In another image MacKay liked to use, he envisioned selective information as choosing among folders in a file drawer, whereas structural information increased the number of drawers (jokes in one drawer, academic treatises in another).[12]

Since structural information indicates how a message should be interpreted, semantics necessarily enters the picture. In sharp contrast to message probabilities, which have no connection with meaning, structural information was to be calculated through changes brought about in the receiver's mind. "It's raining," heard by someone in a windowless office, would yield a value for the structural information different from the value that it would yield when heard by someone looking out a window at rain. To em- phasize the correlation between structural information and changes in the receiver's mind, MacKay offered an analogy: "It is as if we had discovered how to talk quantitatively about size through discovering its effects on the measuring apparatus."6 The analogy implies that representations created by the mind have a double valence. Seen from one perspective, they contain information about the world ("It's raining"). From another perspective, they are interactive phenomena that point back to the observer, for this information is quantified by measuring changes in the "measuring instrument," that is, in the mind itself. And how does one measure these changes? An observer looks at the mind of the person who received the message, which is to say that changes are made in the observer's mind, which in tum can also be observed and measured by someone else. The progression tends toward the infinite regress characteristic of reflexivity. Arguing for a strong correlation between the nature of a representation and its effect, MacKay's model recognized the mutual constitution of form and content, message and receiver. His model was fundamentally different from the Shannon-Wiener theory because it triangulated between reflexivity, information, and meaning. In the context of the Macy Conferences, his conclusion qualified as radical: subjectivity, far from being a morass to be avoided, is precisely what enables information and meaning to be connected.[13]

The problem was how to quantify the model. To achieve quantification, a mathematical model was needed for the changes that a message triggered in the receiver's mind. The staggering problems this presented no doubt explain why MacKay's version of information theory was not widely accepted among the electrical engineers who would be writing, reading, and teaching the textbooks on information theory in the coming decades. Although MacKay's work continued to be foundational for the British school of information theory, in the V nited States the Shannon-Wiener definition of information, not MacKay's, became the industry standard.[14]

### Tzannes' attempt to revise MacKay's information theory

Not everyone capitulated. As late as 1968, Nicolas S. Tzannes, an information theorist working for the V.S. government, sent Warren McCulloch a memorandum about his attempt to revise MacKay's theory so that it would be more workable? He wanted to define information so that its meaning varied with context, and he looked to Kotelly's context algebra for a way to handle these changes quantitatively. In the process, he made an important observation. He pointed out that whereas Shannon and Wiener define information in terms of what it is, MacKay defines it in terms of what it does. The formulation emphasizes the reification that information undergoes in the Shannon-Wiener theory. Stripped of context, it becomes a mathematical quantity weightless as sunshine, moving in a rarefied realm of pure probability, not tied down to bodies or material instantiations. The price it pays for this universality is its divorce from representation. When information is made representational, as in MacKay's model, it is conceptualized as an action rather than a thing. Verblike, it becomes a process that someone enacts, and thus it necessarily implies context and embodiment. The price it pays for embodiment is difficulty of quantification and loss of universality.[15]

## Notes

1. Hayles 1999, p. 7
2. Hayles 1999, p. 18
3. Hayles 1999, p. 18
4. Hayles 1999, p. 19
5. Hayles 1999, pp. 51-52
6. Hayles 1999, p. 52
7. Hayles 1999, pp. 52-53
8. Hayles 1999, p. 53
9. Hayles 1999, pp. 53-54
10. Hayles 1999, p. 54
11. Hayles 1999, pp. 54-55
12. Hayles 1999, p. 55
13. Hayles 1999, pp. 55-56
14. Hayles 1999, p. 56
15. Hayles 1999, p. 56