The Outlier Dictionary of Chinese Characters is the only resource available for learners which makes use of the latest research on character etymology coming out of Mainland China and Taiwan. It explains not only what each character means, but why it means that—why it looks the way it does, and how its form is related to its pronunciation and meaning.
But what is Chinese character etymology? It’s usually defined something like “the story of a character’s origin and development”, but that definition doesn’t give the full picture. Be that as it may, if we accept this definition for the moment, the first question someone learning Chinese should ask is “How much of that story do I need to know to effectively learn Chinese characters?” There is no single answer to this question, because there is no single way to learn anything. There are many different learning styles, each learner has a unique background and way of viewing the world, etc. Having said that, there are also similarities in how we learn and there are principles of effective learning that apply to all of us as human beings. As such, my answer to the question of how much of the story you need to know is: it depends. I’ve identified six aspects of character etymology that are pertinent to learning Chinese characters:
#1: Identifying the functional components in a given character.
#2: Identifying how the functional components function.
#3: Identifying corrupted components.
#4: Identifying the meaning that a given character was invented to represent and its relationship to the character form.
#5: Restoring the pictorial-quality of character components.
#6: The full story.
In this post, I explain aspects #1 to #3, that is, I explain what each one is and its level of importance. Aspects #4 to #6 will be explained in part 2 of this post (it’s a constant battle to keep posts short!). As it turns out, there is a core of things one needs to know about a given character in order to learn it effectively, and then there are things which are interesting (well… to me anyway!) to know, but not strictly speaking necessary.
According to the Outlier philosophy, the goal of character learning is predictability and long-term recall. Predictability refers to when you come across a new character within a meaningful context, being able to make intelligent guesses about the range of sounds and the range of meanings that character might have. Ideally, you could use that knowledge to make a connection with a word or words in the spoken language. Long-term recall refers to the ability to recall a character form long after it’s been learned by way of understanding how Chinese characters as a system represent sound and meaning. Since spoken words are combinations of sound and meaning, you can use these two clues in conjunction with understanding characters on a systemic level to pluck your memory strings and recall a character’s form. This is accomplished by understanding the functional components of each character and by understanding how characters work on the system level.
Studies have shown that native Chinese speakers have an intuition about how a given unknown character may sound or what it may mean, but they often find it difficult to articulate. This intuition comes from learning thousands of characters. It is a reflection of the logic inherent to the Chinese writing system. And, it is imperfect. It also takes a long time to acquire. With the our character dictionary, our aim is to instill the abilities required for long-term recall and predictability from day one, but that can only be done if we understand characters on their terms, not on ours. That is, we need to have as accurate a picture as possible of how characters actually work, and do away with all the misconceptions that are out there. Now, let’s look at the six aspects of etymology:
Aspect #1: Identifying the functional components in a given character.
This is by far the most important aspect of etymology. Knowing how a character represents sound and meaning is the basis for understanding Chinese characters as a system. It is also the basis for being able to detect real (as opposed to superficial) relationships between characters: sound and meaning relationships. And, last but not least, it’s the key to understanding individual characters. So, understanding what a character’s functional components are is crucial for all learners and they (in combination with Aspect #2) are what makes predictability possible to beginners (when you need it the most!). And, while they aren’t the only route to long-term recall, they are the most effective and they have the most positive side-effects.
Example: The functional components for 識1 shì “to know” are 言 yán “speech” and 戠 zhí “to gather together”. 言 is the meaning component and 戠 is the sound component. People oftentimes view this character as 言 + 音 + 戈 and then create a story to combine the meanings “speech” + “sound” + “lance”, but doing so hides the sound connections between 識 and other characters that share the sound component 戠 zhí : 職 zhí, 織 zhī, 幟 zhì. Creating a new story for how meaning is represented in this character not only obscures the real way meaning is expressed, it gives a false impression as to how Chinese characters represent meaning in general. Not to mention, an infinite number of stories can be created for any one character, but only a story based upon the functional components will bring the benefit of seeing (from an early stage) the real sound and meaning connections between characters.
Aspect #2: Identifying how the functional components function.
In other words, how sound components express sound and how meaning components express meaning. While knowing what the functional components are in a character is very important, so is understanding how they function in that character. For instance, most people do not distinguish between a component expressing meaning by way of meaning vs. expressing meaning by way of form.
What does that mean exactly? Each functional component has three attributes: form, meaning and sound (or pronunciation). Take 自 zì “self” for example. Its form is a picture of a person’s nose. Its meaning is “self” and it’s sound is zì. If 自 expresses meaning by form, then the meaning it expresses has to do with “nose”, 鼻 bí “nose”, 臭 chòu “to stink”, 嗅 xiù “to smell”. So, in the characters 鼻, 臭 and 嗅, 自 gives meaning by form. Characters that have components that express meaning by meaning appeared rather late in the game and as such, they are small in number.
Example: 歪 wāi “not straight, crooked (literally 不正)”. It’s easily seen that this character is based upon the combination of the meanings 不 “not” and 正 “straight”. 不’s form is either “part of a plant” or “roots of a plant” and 正’s form is “a foot marching towards a city”. These two forms obviously have nothing to do with the meaning of 歪. So, in 歪, 不 and 正 give meaning by meaning. Most meaning components give meaning by form and only a minority give meaning by meaning, yet, most people interpret characters to all be meaning by meaning. Worse yet, they don’t consider the original meanings of the components, but depend rather on their modern meanings. This way of thinking is almost guaranteed to be inaccurate (read: does not help with predictability, long-term recall or seeing real connections between characters).
Understanding how components express meaning and sound is very important to understanding both how individual characters work and how characters work as a system.
Aspect #3: Identifying corrupted components.
Technically speaking, this should be part of Aspect #1, but since most people aren’t familiar with this concept, I’ll handle it as a full aspect. Character corruption means that a “character changes form in such a way that the original form intended by the inventor of that character is altered” (from an earlier post). In other words, things aren’t always what they seem. There are several advantages to knowing when a component is corrupted or not:
1. To clear up misunderstandings and answer questions such as:
What does 往 wǎng “toward” have to do with 主 zhǔ “master, owner”?
The answer is: nothing, except a superficial connection. The right side of 往 is actually 2, which is a foot (representing “to go”) and 王 wáng which gives the sound. The 彳 was added later to emphasize movement. The “foot” got corrupted into 丶, making look like 主.
How do the two mountains 山 shān in 出 chū “to go out” represent the notion “to go out”?
Answer: they don’t. They are corruptions. Take a look: “to go out” was originally represented by a foot (now written 止 zhǐ, which means “stop” in modern Chinese) walking out of a cave and through the process of stylization, came to look like two mountains on top of one another.
2. To give your mind closure. If you know that a given component is a corruption, you know that it is not adding a sound or meaning to the character. There’s no need to look any further for an explanation.
3. To give you a clearer understanding of how modern characters work. Predictability and long-term recall come quickest and most efficiently by understanding characters on an individual as well as systemic level. If you reinterpret corrupted components with your own meaning, you’re simply adding noise to the system.
But isn’t this just making the whole thing more complicated and harder to learn? I would argue no. The number one rule for memorizing anything is understanding. The more you understand the object of learning, the easier it is for you to remember that thing. Understanding which character components are corrupted is increasing your understanding. And, it’s not necessary to know the whole story behind the corruption. The main thing is knowing that the corrupted component is not giving a sound or meaning to the character.
Example: 高 gāo “tall”: It’s enough that you know that 亠 tóu “lid, cover”, 口 kǒu “mouth” and 冋 jiǒng have nothing to do with why 高 looks the way it does. It is actually just a picture of a tall building and has a 口 on the bottom to distinguish it from 京 jīng, which is also a picture of a tall building. Of course, you do need to remember that these components are necessary for correctly writing 高, but you can’t use them to understand why 高 looks the way it does.
Example: 粦 lín “ghost fire”: Since 粦 is only used as a sound component in modern Chinese, it’s enough to know how to write it and its pronunciation. However, understanding what it was originally a picture of is interesting to some people (and can aid in remembering how to write it). If you’re one of those people (I know I am!), then you need to remember that 米 mǐ “rice” isn’t giving a meaning or sound, it’s merely a placeholder for a picture of a corpse that is on fire (okay, so knowing the real story isn’t always more pleasant!) and that 舛 is a picture of two feet.
Today, we discussed the first three aspects of etymology, including identifying the functional components in a given character, identifying how the functional components function, and identifying corrupted components. Each of these is important for understanding the logic of how Chinese characters work, both individually and as a system, and are equally important for effective character learning, that is to say, for learning characters in a way that gets you to long-term recall and predictability as quickly as possible. We also saw that etymology isn’t simply a matter of telling stories about character origins, but is actually tied into the very guts of how Chinese characters do the job of expressing sound and meaning. In our next post on this topic, we’ll cover the remaining three aspects of etymology: #4) Identifying the meaning that a given character was invented to represent and its relationship to the character form; #5) Restoring the pictorial-quality of character components, and #6) “The full story”.
1 In the PRC, the standard reading for this character is shí.
2 Character form from 小學堂.