The Catalytic Potential of Entropy
You may have casually used the word 鈥渆ntropy鈥 to describe a world that looks on the verge of collapse鈥攐r at least one that is less predictable鈥攚here order is in decline. But if we really dive into the word, we might get some insight into how entropy defines how we use language and technology to communicate with one another鈥攁nd whether a cooling cup of coffee in front of a movie helps us gain that insight.
Entropy forms the backbone of both natural and computer science鈥攖wo disciplines responsible for many powerful technologies of the modern world.
In thermodynamics, entropy is the key tenet of the discipline鈥檚 second law. The First Law of Thermodynamics states that energy can neither be created nor destroyed鈥攐nly its state can change. The Second Law, also known as the Entropy Law, states that in a closed system, energy always seeks to be evenly distributed. In practice, this means that in a small energy 鈥渟ystem鈥 involving a room and a mug of coffee, the mug of coffee will eventually become room temperature. At that point, the system achieves thermodynamic equilibrium鈥攅nergy is evenly distributed.
In information theory, on the other hand, entropy refers to the rate at which information is communicated in a message. That's pretty abstract, but it makes more sense when considering language. Say you鈥檙e searching for The Dark Knight Rises on Netflix. You want the desired result to pop up after entering just one or two words, so assuming the order doesn鈥檛 matter, you have to decide which words communicate the most information, or reduce uncertainty, to the greatest extent. You鈥檇 probably want to enter 鈥淒ark Knight鈥 rather than 鈥淭he Dark,鈥 since there are fewer movie titles that feature Batman than there are about darkness. Different word combinations communicate the message at different rates鈥攖hat's what information entropy seeks to define.
So how are these two seemingly disparate ideas similar? Understanding a bit more about how they work both in practice and in theory will illuminate their relationship鈥攁nd its importance for our evolution and technological advancement.
Thermodynamic energy in a concentrated, usable form is considered ordered; energy in a distributed, unavailable form is considered disordered. What does this mean? For fossil fuels, it means that a lump of coal is ordered and usable, but once you use it and convert it to mechanical work (the desired end) or heat (a byproduct), it becomes disordered and unusable. The tricky part is that entropy only moves in one direction, from ordered to disordered; you can鈥檛 create a lump of coal with heat and pressure. So, in a way, entropy forms the basis of scarcity in natural resources.
Food, a form of ordered energy, sustains one of the most beautiful systems of all: life. As established, entropy moves in only one direction, toward equilibrium鈥unless acted upon by a force outside the system. Thus, plants and animals survive by gathering available, ordered energy from the environment and then emitting waste stripped of nutrients. But continuously passing energy eventually degrades our bodies, causing us to break down and die. And after death, bodies decompose and dissipate into the surrounding environment鈥攍ike heat from a mug dissipating into a room鈥攖o reach thermodynamic equilibrium. In this way, entropy is responsible not only for material scarcity, but for scarcity of time.
Theoretically, entropy will only rest its steady march once it's brought about the heat-death of the universe鈥攖he end of time and ultimate end state. Everything on earth and in space will eventually expand, explode, die, and distribute free energy evenly through what is really the biggest closed system of all: the universe. In this way, entropy is a universal law similar to gravity鈥 it operates at both the smallest and largest scales of biophysics.
Before delving into how information theory uses entropy, it鈥檚 helpful to establish a crucial fact about information itself: The informative value of a communicated message depends on the degree to which its content is surprising. A more surprising message has more informational gain.
Informational entropy helps a great deal in machine learning, where computer systems use algorithms and statistical models to perform tasks via patterns and inference rather than explicit instructions from a human. To continue with the language example, when laying out words letter by letter to compose a message, some letters, such as "E," will appear more frequently than others. But paradoxically, since "E" is so common, it communicates less information: More words have "E"s in them than "X"s or "Z"s. So, the event of a letter "Z" rather than "E"鈥攋ust like searching 鈥渄ark knight鈥 instead of 鈥渢he dark鈥濃攔educes uncertainty, or entropy, at a higher rate, because there's more surprise.
How does machine learning use entropy? Decision trees, a common algorithm, select one of many different attributes鈥攁lso known as features or independent variables鈥攖o repeatedly split samples into subsets. At each split, the algorithm selects one attribute to split the sample on, and continues to do so until all subsets are pure鈥攐r, in other words, until each individual sample in a subset shares the same classification.
If our sample was a group of words, and our features letters in those words, the algorithm would split the group of words based on their inclusion or exclusion of letters. So, if we used each letter in the alphabet, then the branches of the tree at the very bottom would include each word in our sample. In practice, we limit the purity of the tree to avoid overfitting, so the algorithm can generalize to words it hasn鈥檛 seen before.
We would expect the decision tree to come to the conclusion that the letter "Z" decreases entropy at the highest rate based on the distribution of the letters in the sample. When searching for a new word, it would split the sample on the letter "Z"鈥攖hose words that include "Z," and those that do not. If we give the algorithm the task of searching for The Dark Knight Rises, it would prioritize words based on the rarity of the letters in the sample it was trained on. It would check for a "Z," "Q," and so on down the list of importance鈥攗ntil it finally found that "K" is pretty valuable. Then, it would be programmed to recommend 鈥淒ark Knight鈥 rather than 鈥淭he Rises.鈥
So, how do these two interpretations of entropy lead to the same purpose? In short, information theory identified another, essential way that humans reduce entropy鈥攂y communicating. When we communicate a message, we reduce uncertainty about the world鈥攁nd this had an indelible impact on the evolution of language and social organization.
Our hunter-gatherer ancestors used language both to acquire ordered energy鈥攙ia coordinated hunting for food鈥攁nd to avoid being killed by rival groups or predators. When communicating under pressure, it was essential for them to reduce entropy as efficiently as possible: Shouting 鈥淟ion!鈥 or 鈥淩un!鈥 is much more effective than saying, 鈥淭here is a lion sneaking up behind you鈥攔un away!鈥 This warning reduces entropy to a greater extent by reducing 1) the uncertainty of whether or not you鈥檙e in danger (informational entropy) and 2) the process of being eaten (energy entropy). Efficient communication reduces the probability space of all possible events, allowing us to act more quickly and effectively.
Our goal is to find ordered sources of energy and resist the influence of entropy on our bodies. In communication, we minimize entropy by finding information and reducing uncertainty. We've invented technologies to help us with both鈥攚e use machines to expend energy and computers to communicate vast amounts of information. Maximizing the returns of technology requires an understanding of both the physical and digital domains鈥攁nd of the powerful law that connects them.