What is Etymology and Is it Useful for Learning Chinese Characters? (Part 1)

This post is also available in: Chinese (Simplified), Chinese (Traditional), German

金文 imageThe Outlier Dictionary of Chinese Characters is the only resource available for learners which makes use of the latest research on character etymology coming out of Mainland China and Taiwan. It explains not only what each character means, but why it means that—why it looks the way it does, and how its form is related to its pronunciation and meaning.

But what is Chinese character etymology? It’s usually defined something like “the story of a character’s origin and development”, but that definition doesn’t give the full picture. Be that as it may, if we accept this definition for the moment, the first question someone learning Chinese should ask is “How much of that story do I need to know to effectively learn Chinese characters?” There is no single answer to this question, because there is no single way to learn anything. There are many different learning styles, each learner has a unique background and way of viewing the world, etc. Having said that, there are also similarities in how we learn and there are principles of effective learning that apply to all of us as human beings. As such, my answer to the question of how much of the story you need to know is: it depends. I’ve identified six aspects of character etymology that are pertinent to learning Chinese characters:

#1: Identifying the functional components in a given character.
#2: Identifying how the functional components function.
#3: Identifying corrupted components.
#4: Identifying the meaning that a given character was invented to represent and its relationship to the character form.
#5: Restoring the pictorial-quality of character components.
#6: The full story.

In this post, I explain aspects #1 to #3, that is, I explain what each one is and its level of importance. Aspects #4 to #6 will be explained in part 2 of this post (it’s a constant battle to keep posts short!). As it turns out, there is a core of things one needs to know about a given character in order to learn it effectively, and then there are things which are interesting (well… to me anyway!) to know, but not strictly speaking necessary.

According to the Outlier philosophy, the goal of character learning is predictability and long-term recall. Predictability refers to when you come across a new character within a meaningful context, being able to make intelligent guesses about the range of sounds and the range of meanings that character might have. Ideally, you could use that knowledge to make a connection with a word or words in the spoken language. Long-term recall refers to the ability to recall a character form long after it’s been learned by way of understanding how Chinese characters as a system represent sound and meaning. Since spoken words are combinations of sound and meaning, you can use these two clues in conjunction with understanding characters on a systemic level to pluck your memory strings and recall a character’s form. This is accomplished by understanding the functional components of each character and by understanding how characters work on the system level.

Studies have shown that native Chinese speakers have an intuition about how a given unknown character may sound or what it may mean, but they often find it difficult to articulate. This intuition comes from learning thousands of characters. It is a reflection of the logic inherent to the Chinese writing system. And, it is imperfect. It also takes a long time to acquire. With the our character dictionary, our aim is to instill the abilities required for long-term recall and predictability from day one, but that can only be done if we understand characters on their terms, not on ours. That is, we need to have as accurate a picture as possible of how characters actually work, and do away with all the misconceptions that are out there. Now, let’s look at the six aspects of etymology:

Aspect #1: Identifying the functional components in a given character.

象 development

This is by far the most important aspect of etymology. Knowing how a character represents sound and meaning is the basis for understanding Chinese characters as a system. It is also the basis for being able to detect real (as opposed to superficial) relationships between characters: sound and meaning relationships. And, last but not least, it’s the key to understanding individual characters. So, understanding what a character’s functional components are is crucial for all learners and they (in combination with Aspect #2) are what makes predictability possible to beginners (when you need it the most!). And, while they aren’t the only route to long-term recall, they are the most effective and they have the most positive side-effects.

Example: The functional components for 識1 shì “to know” are 言 yán “speech” and 戠 zhí “to gather together”. 言 is the meaning component and 戠 is the sound component. People oftentimes view this character as 言 + 音 + 戈 and then create a story to combine the meanings “speech” + “sound” + “lance”, but doing so hides the sound connections between 識 and other characters that share the sound component 戠 zhí : 職 zhí, 織 zhī, 幟 zhì. Creating a new story for how meaning is represented in this character not only obscures the real way meaning is expressed, it gives a false impression as to how Chinese characters represent meaning in general. Not to mention, an infinite number of stories can be created for any one character, but only a story based upon the functional components will bring the benefit of seeing (from an early stage) the real sound and meaning connections between characters.

Aspect #2: Identifying how the functional components function.


In other words, how sound components express sound and how meaning components express meaning. While knowing what the functional components are in a character is very important, so is understanding how they function in that character. For instance, most people do not distinguish between a component expressing meaning by way of meaning vs. expressing meaning by way of form.

What does that mean exactly? Each functional component has three attributes: form, meaning and sound (or pronunciation). Take 自  “self” for example. Its form is a picture of a person’s nose. Its meaning is “self” and it’s sound is . If 自 expresses meaning by form, then the meaning it expresses has to do with “nose”, 鼻  “nose”, 臭 chòu “to stink”, 嗅 xiù “to smell”. So, in the characters 鼻, 臭 and 嗅, 自 gives meaning by form. Characters that have components that express meaning by meaning appeared rather late in the game and as such, they are small in number.

Example: 歪 wāi “not straight, crooked (literally 不正)”. It’s easily seen that this character is based upon the combination of the meanings 不 “not” and 正 “straight”. 不’s form is either “part of a plant” or “roots of a plant” and 正’s form is “a foot marching towards a city”. These two forms obviously have nothing to do with the meaning of 歪. So, in 歪, 不 and 正 give meaning by meaning. Most meaning components give meaning by form and only a minority give meaning by meaning, yet, most people interpret characters to all be meaning by meaning. Worse yet, they don’t consider the original meanings of the components, but depend rather on their modern meanings. This way of thinking is almost guaranteed to be inaccurate (read: does not help with predictability, long-term recall or seeing real connections between characters).

Understanding how components express meaning and sound is very important to understanding both how individual characters work and how characters work as a system.

Aspect #3: Identifying corrupted components.

mo corruption

Technically speaking, this should be part of Aspect #1, but since most people aren’t familiar with this concept, I’ll handle it as a full aspect. Character corruption means that a “character changes form in such a way that the original form intended by the inventor of that character is altered” (from an earlier post). In other words, things aren’t always what they seem. There are several advantages to knowing when a component is corrupted or not:

1. To clear up misunderstandings and answer questions such as:

What does 往 wǎng “toward” have to do with 主 zhǔ “master, owner”?
The answer is: nothing, except a superficial connection. The right side of 往 is actually 往主2, which is a foot (representing “to go”) and 王 wáng which gives the sound. The 彳 was added later to emphasize movement. The “foot” got corrupted into 丶, making 往主 look like 主.

How do the two mountains 山 shān in 出 chū “to go out” represent the notion “to go out”?

Answer: they don’t. They are corruptions. Take a look: 出 “to go out” was originally represented by a foot 止 (now written 止 zhǐ, which means “stop” in modern Chinese) walking out of a cave and through the process of stylization, came to look like two mountains on top of one another.

2. To give your mind closure. If you know that a given component is a corruption, you know that it is not adding a sound or meaning to the character. There’s no need to look any further for an explanation.

3. To give you a clearer understanding of how modern characters work. Predictability and long-term recall come quickest and most efficiently by understanding characters on an individual as well as systemic level. If you reinterpret corrupted components with your own meaning, you’re simply adding noise to the system.
But isn’t this just making the whole thing more complicated and harder to learn? I would argue no. The number one rule for memorizing anything is understanding. The more you understand the object of learning, the easier it is for you to remember that thing. Understanding which character components are corrupted is increasing your understanding. And, it’s not necessary to know the whole story behind the corruption. The main thing is knowing that the corrupted component is not giving a sound or meaning to the character.

To use some examples that have already appeared in our blog (also, check out our two posts on corruption here and here):

Example:gāo “tall”: It’s enough that you know that 亠 tóu “lid, cover”, 口 kǒu “mouth” and 冋 jiǒng have nothing to do with why 高 looks the way it does. It is actually just a picture of a tall building and has a 口 on the bottom to distinguish it from 京 jīng, which is also a picture of a tall building. Of course, you do need to remember that these components are necessary for correctly writing 高, but you can’t use them to understand why 高 looks the way it does.

Example:lín “ghost fire”: Since 粦 is only used as a sound component in modern Chinese, it’s enough to know how to write it and its pronunciation. However, understanding what it was originally a picture of is interesting to some people (and can aid in remembering how to write it). If you’re one of those people (I know I am!), then you need to remember that 米  “rice” isn’t giving a meaning or sound, it’s merely a placeholder for a picture of a corpse that is on fire (okay, so knowing the real story isn’t always more pleasant!) and that 舛 is a picture of two feet.

Today, we discussed the first three aspects of etymology, including identifying the functional components in a given character, identifying how the functional components function, and identifying corrupted components. Each of these is important for understanding the logic of how Chinese characters work, both individually and as a system, and are equally important for effective character learning, that is to say, for learning characters in a way that gets you to long-term recall and predictability as quickly as possible. We also saw that etymology isn’t simply a matter of telling stories about character origins, but is actually tied into the very guts of how Chinese characters do the job of expressing sound and meaning. In our next post on this topic, we’ll cover the remaining three aspects of etymology: #4) Identifying the meaning that a given character was invented to represent and its relationship to the character form; #5) Restoring the pictorial-quality of character components, and #6) “The full story”.


1  In the PRC, the standard reading for this character is shí.

2  Character form from 小學堂.

5 comments on “What is Etymology and Is it Useful for Learning Chinese Characters? (Part 1)”

  1. 朱真明 Reply

    I’ve taken this quote from “A LEXICON OF CLASSICAL CHINESE” Volume 1, version 14.

    “Etymology is not meaning
    With the minor exceptions of newly-coined technical terms and newly-introduced loan
    words, no word ever has a limited sense that can be called “fundamental” or “original.” Words
    are used in extended senses every day. An extension that is widely adopted becomes another
    acceptation of the word. There is never a point in the history of any language at which it has
    only a core vocabulary without variants or doublets or extended senses. Consider these
    examples from English:
    check (= “draft on a demand deposit”) Check that! (= “Let me correct my error.”)
    Check! (= “That’s correct.”) checker (= “game piece”)
    Check! (= “Guard your king.”) checker (= “tallyman”)
    Check! (= “Bring the bill, please.”) Checkers (name of a dog)
    The syllable check in all of the above examples derives from the same word, Persian
    s&a#h “king”. Would someone learning English be helped to master these idioms by knowing
    that? Would a fluent speaker asked to Check out this book be enabled by this etymology to decide
    whether the request is to examine the book or to withdraw it from a library?
    Etymology is an essential part of the historical study of language. For learning to read a
    language to draw information from texts written in it, etymology is a distraction. It can be of
    use in following the discourse of certain learned writers who pay careful attention to derivations
    of words they use. Most writers do not; nor should most readers. It is better practice to
    make out a word’s meaning from examples of it in the works of your author and contemporaries.
    Typically, etymology only comes into play when a reader may be tempted to take a
    word in a sense that was not yet widely adopted at the date of the text. ”

    Do you have any comments on this argument? That is not regarding long term recollection of characters but the ability to determine the appropriate meaning of the characters in various contexts. How does Etymology given the various changes to meanings and usages over the centuries appropriately educate oneself on how to predictability determine the meaning of said character?

    • Outlier Linguistic Solutions Reply

      Keep in mind that the author of that quote is discussing “word etymology”. When we say “etymology”, we usually mean “character etymology”. It is always a good idea to keep “characters” (i.e., things you can see which are used to record spoken words) separate from “words” (i.e., sound and meaning combinations that we use when we talk). There is no equal sign in between characters and words.

      Another issue to keep in mind, especially when reading ancient texts, is that there is no one-to-one correlation between characters and words. While a given character may often represent a given word (and that word probably has several different senses), it is not true that that character always represents a given spoken word. There is a process called 通假tōngjiǎ in which characters are used simply for their sound. This is similar, though not exactly the same, as the modern Chinese situation where characters are used to transliterate the names of people, countries, etc. Take 馬來西亞 Mǎláixīyǎ (PRC: Mǎláixīyà) “Malaysia” for example. The name does not mean “Horse-come-west-asia”. The characters here are used only for their sound. This can get really confusing in words like 新西蘭 Xīnxīlán “New Zealand” where 新 “new” is used for its meaning, while 西 & 蘭 are used for their sounds. In Taiwan, people usually say 紐西蘭 Niǔxīlán which is sound only.

      When trying to figure out what a character represents in an ancient text (i.e., which spoken word and which sense of that word), you always have to take it’s form, sound and meaning into account (here “meaning” refers both to how it is used in the context that you are looking at and to the possible senses that are normally represented by the given character). Ignore any of these and you court disaster. If a character is being used for its sound, then obviously the character’s etymology doesn’t play a role (unless you count correctly identifying the sound component as part of etymology). I would probably argue though, that knowing a character’s original meaning (if that meaning is indeed knowable) might be of some value, for that meaning or meanings that are extensions of that original meaning and for recognizing 假借jiǎjiè meanings which are only related to a character form via sound.

      Ex. 花 huā whose original meaning was “flower, blossom” and has extended meanings “pattern, design”; “fireworks”; but also “to spend” which is a 假借 meaning (i.e., not related to the character form except via pronunciation). If you know the original meaning “flower” (which is easily connected to the character form via its semantic component 艹 “grass; plant”), and you know something about how word meanings evolve, then connecting the meaning “pattern” (i.e., an abstraction of what a ‘flower’ looks like), “design” (an abstraction of ‘pattern’), “fireworks” (whose explosions can look similar to flowers) becomes easier and therefore easier to remember. “to spend”, on the other hand, isn’t related to “flower”. Rather, it’s sound was similar or the same as the word for “flower”, so the character 花 was borrowed to represent it.

      In summary, I would argue that knowing the original meaning of a character is helpful in recognizing whether a given character is being used to represent a spoken word that is either the original meaning or a meaning which evolved from it; or if it is only being used for its sound. It’s also helpful to know at least the very basics of how word meanings evolve.

  2. Eric Reply

    First: I think this project is awesome and I’m extremely excited to learn more and see where it all goes (Congrats on Kickstarter success!). I’m certain the world of Chinese learners will benefit greatly, and I’m certain you and other smart teachers/learners will think of excellent ways to make character learning more efficient with this wealth of new information.

    That being said, I’m a bit skeptical about the claims above that this will improve learning early on. Specifically, I don’t think merely having additional information, even if it’s better information, will necessarily make learning better or faster for beginners.

    Let’s take the sound component in 識 as an example. Following your argument, it will be easier in the long run for a beginning learner to memorize this character as 言 + 戠 zhí rather than as 言+音+戈. This means 1) memorizing a more complex chunk (戠 rather than 音+戈) and 2) learning it’s relationship to an extra sound that, at the moment of learning, is probably irrelevant to the meaningful word(s) being learned (presumably 認識 rènshi ‘to know’). My sense is that this is likely to be harder than learning the simpler (perhaps already recognizable) 音+戈 and, perhaps, a silly story. Also, I’d guess that learning the sound component will not be as immediately useful as learning 音+戈, since a superficial, but probably more common pattern that beginners will meet is: ‘stuff’+戈 (e.g. 或, 戴). Your main objection to the latter approach seems to be the need to ‘unlearn’ incorrect patterns/stories, but I feel like this is a red herring. First, superficial or not, the pattern ‘stuff’+戈 is a real pattern and it’s a good memory hook. Second, do we have evidence of anyone struggling to learn the correct character composition because they first learned a silly story/superficial pattern? I suppose it’s also worth noting that it’s not necessarily an either/or choice. One could learn all of this stuff at once. The issue is really: how much should one learn *at first*? Is it worth learning a bunch of initially ‘extra’ stuff with the hope that it will make more sense later? My guess is that, for most learners, it won’t matter if you point out the sound components early, the relationship will not sink in until they know multiple characters that illustrate it, and in the meantime they’re possibly wasting time or getting overloaded with unnecessary information at a time when the whole idea of characters is still very much new and strange. To be clear, I have no doubt that sound components will be useful at a later point–I’m fully on board with that, and I always tried to point those connections out to my students once they became relevant. I just think most learners I’ve had in classrooms would be overwhelmed by the excess of info early on, or at least they wouldn’t retain it.

    So, in short, I’m really psyched about your work–particularly with sound components–just dubious that it will all be useful for beginners. I certainly hope I’m wrong!

    • Outlier Linguistic Solutions Reply

      Thanks for commenting! Skepticism is welcome! To address your concerns:
      We aren’t providing a specific learning/teaching method here. We will do a workbook that will address those issues, but the dictionary doesn’t do this directly. That is by design. The claim we are making is this: we are providing the user with reliable and correct information (to the extent that such is possible) about how Chinese characters actually work. Further, we claim that if you learn characters based upon how they actually work, you will be able to use sound and meaning clues of words/characters that you want to write/read/type to recall those forms or make intelligent guesses about characters you haven’t learned yet. You will obtain these abilities much faster and much more accurately than if you learn in some way that hides or doesn’t make explicit sound/meaning connections. We are providing the means to understand how characters actually work, not telling you the mechanics of how to go about learning them.

      To address your example:
      We are not saying that you can’t make up a story for 言+音+戈 to learn 識. What we are saying is that if you learn the character using such a story, it’s best for your learning to first understand how the character works: it is made up of semantic component 言 “speech” and sound component 戠 zhí. This can be accomplished by simply reading our explanations and component break down. No one is saying you have to memorize that 戠 is pronounced zhí. In fact, since 戠 is not used in modern Chinese, you’d be better off simply knowing what range of sounds it can represent than it’s actual pronunciation. Once again, we expect the user to read the sound formulas (these aren’t in the demo, but will be in the actual dictionary), not memorize them.

      Also, a beginning learner probably won’t get it right away. But, if they read (not memorize) the component breakdowns each time they learn a character, their learning experience will at some point catch up to the theory. For instance, if 識 is the first character I learn, I might not know what a sound component even is. Later, when I learn 職 and 織, also by reading their component breakdowns, I’ll most likely notice the pattern. Then, when I’m reading some authentic materials, and I come across 幟 in a meaningful context, I’ll be able to guess that is has something to do with textile materials (cloth, silk, etc.) and is probably pronounced zhi, chi, shi. For people that enjoy memorizing that kind of thing, of course, that’s fine too.

      >> Your main objection to the latter approach seems to be the need to ‘unlearn’ incorrect patterns/stories

      No, our main objection is learning characters without understanding how they actually work. The reason for this is that it hides actual sound/meaning connections. Those connections are the key to timely acquisition of long-term recall and predictive ability.

      >Also, I’d guess that learning the sound component will not be as immediately useful as learning

      I have to disagree here. Learning what the sound component is in a given character is crucial and sets the learner up to understand how most Chinese characters work (i.e., having a sound and a semantic component). According to our way of thinking, it’s better to learn each character as a member of a system, not as something in isolation (which is what silly stories do when left to their own devices). Each character you learn should reinforce knowledge of the system as a whole as it’s the system that will allow you to recall characters long-term and make predictions about characters you haven’t learned yet.

      >>The issue is really: how much should one learn *at first*? Is it worth learning a bunch of initially ‘extra’ >>stuff with the hope that it will make more sense later?

      One should learn the absolute minimum amount possible. We aren’t suggesting anyone learn anything ‘extra’. That’s why we separate the Essentials version from the Expert version. We don’t even require anyone to memorize all of the Essentials data, merely to read it.

      >>My guess is that, for most learners, it won’t matter if you point out the sound components early, the relationship will not sink in until they know multiple characters that illustrate it,

      Agreed that they won’t get it initially, but, that doesn’t mean pointing out the sound component is bad. To the contrary, pointing it out (but not requiring it to be memorized) is good. It is explicitly pointing out a pattern that the learner will see for the rest of their character learning career. We point out the patterns for each character because learners will get that much faster than if we don’t point them out. I totally agree that there is some critical amount of learning experience before things start to click. Abstractions can only be understood if you have mastered some critical number of concrete examples. That’s why we focus on functional components (things doing a real job within a character) and not abstract character types.

    • 苑安雄 Reply

      If I could do it all over again, I would learn all of the characters used as phonetic components before any of the others, regardless of frequency (including characters like 戠). It would be a lot of work right off the bat, and with very little payoff in the beginning, but I play the long game. The foundation I’d have built for predicting character readings would have proven indispensable later on. Just learning those characters as a meaningful syllabary would have already well acquainted me with semantic components. But hey, I can only speak for myself, and many of your students might prefer playing the short game (most of my peers in high school and university sure did).

Leave A Reply

Your email address will not be published. Required fields are marked *