¡Feliz Año Nuevo!
This past semester, while teaching, and spending time with my new granddaughter (see below), I’ve been making incremental progress on my second book. As you might be able to tell from the description on that page, this project involves less research than ¿Por qué?, though more creativity. Nevertheless, right now I’m in the middle of a substantial research arc for the new book. It involves etymology (word origins), and is particularly satisfying because it redoes, in a more principled fashion, an analysis I did for Question #38 of ¿Por qué?: “Where does Spanish vocabulary come from?”
For that analysis, I looked up the etymologies of 500 Spanish words randomly chosen from a standard Spanish etymological dictionary. I found that roughly one-third of them were “native”, i.e. they descended from Vulgar Latin, one-third were later borrowings from Latin, and one-third were borrowed from other languages. But I wondered whether the dictionary’s selection of words might have biased my results toward borrowings.
Since then, I’ve learned about the World Loanword Database (WOLD) project, in which linguists researched the origins of roughly 1500 words, corresponding to an agreed-upon set of core meanings, in 41 different languages from around the world. Martin Haspelmath, one of the co-editors of the WOLD project (with Uri Tadmor), also co-edited the World Atlas of Linguistic Structures (WALS), which I found invaluable while writing ¿Por qué? That makes him one of my favorite linguists.
The data for each of these languages are available on the WOLD website, and the results were published in a very expensive book. (Maybe if such books were cheaper, more people would buy them. Ahem.) The #1 borrower was Selice Romani (a variety of Romani), with 73% loanwords, and the #41 borrower was Mandarin Chinese, at 1.2%.
Because WOLD aimed for broad coverage of the world’s languages, it included only one Romance language: Romanian. I decided to apply the WOLD methodology to Spanish myself so that I could compare Spanish to other languages in terms of its borrowing patterns. The 1500 WOLD meanings are divided into different semantic categories, such as clothing and cognition, so I’ll be able to make this comparison within each of these categories as well as overall.
Honestly, you have to be a total nerd to perform this kind of analysis voluntarily, and a little crazy to do it twice. The 500-word analysis took roughly forever, and tripling this volume is, of course, taking even longer, especially since this time around I’ve had to choose the best Spanish word (or words) to fit each WOLD meaning, whereas in my first analysis, the 500 Spanish words were already in Spanish! I’m hoping that my results will be interesting enough to write up as a journal article, perhaps for Hispania.
I don’t have final results yet: right now, I’m double-checking the etymologies. However, the borrowing rate appears to be much lower than in my first analysis: roughly, 1/3 borrowings versus 2/3. This puts Spanish safely in WOLD’s category of “high borrowers,” roughly on a par with Japanese (35%) though lower than English (41%) and Romanian (42%).
In advance of definitive results, the point of this blog post is to share with you some of the interesting aspects of Spanish vocabulary — or, precisely, non-vocabulary, or meanings NOT expressed in Spanish — that I’ve come across while selecting the Spanish words for the WOLD meanings.
Some of these missing meanings, as you might expect in a project that deliberately spans the globe, are culturally specific. For example, kinship terms in some languages are more sex-specific than in Spanish (or English). Your tío ‘uncle’ is a tío whether he’s your father’s brother or your mother’s, but some languages have separate terms for these relationships. Similarly, your yerno ‘son-in-law’ is a yerno whether you are a man or a woman, but some languages encode this difference. Looking beyond kinship terms, WOLD meanings include such culturally-specific words as ‘manioc bread’, ‘grass skirt’, ‘men’s house’, ‘digging stick’, ‘net bag’, ‘fish poison’, and ‘fish trap.’ Of course one can figure out ways to convey such meanings in Spanish, but it would be stretching things to call them standard Spanish phrasal expressions on a par with, say, oso hormiguero ‘anteater’ or dejar caer ‘to drop.’
Three other categories of ‘missing meanings’ are of particular interest to me.
- A few WOLD meanings express grammatical categories that Spanish lacks: a neuter pronoun meaning he, she, or it, separate versions of we that distinguish ‘you and I’ from ‘they and I,’ and distinct negatives that appear before nouns versus elsewhere, like English no versus not.
- Some WOLD meanings seem to be randomly missing in Spanish. Maybe it’s because I’m not a native speaker, or my dictionary skills aren’t as good as I thought they were, but I don’t think there’s a standard way to express “the forked branch” (WOLD meaning 8.74), or “for a long time” (WOLD 14.332) in Spanish. You can certainly say ir a casa, but do those three words add up to a “standard phrasal expression”, as discussed above, for ‘to go or return home’ (WOLD 10.58)? (Note that volver a casa and regresar a casa are equally plausible.) Likewise, Spanish has a standard idiomatic phrase to express carrying something on your back (WOLD 10.613), (llevar a cuestas), but not carrying ‘in hand’ (WOLD 10.612), ‘on head’ (WOLD 10.614), or ‘under the arm’ (10.615). And why does Spanish, like English, lack a general term for a ‘child-in-law’ (WOLD 2.6411), forcing the awkward expression yerno o nuera (‘son- or daughter-in-law’)?
- Finally, Spanish collapses many lexical distinctions found in other languages, as implied by their inclusion in the list of WOLD meanings. I was already aware of many of these, such as ‘in’ and ‘on’ (both en), ‘do’ and ‘make’ (hacer), ‘say’ and ‘tell’ (decir), and ‘afternoon’ and ‘evening’ (tarde). As I worked through the WOLD list I became aware of many others, including ‘cut’ and ‘chop’ (cortar), ‘fault’ and ‘blame’ (culpa), ‘bend’ and ‘fold’ (doblar), ‘paddle’ and ‘oar’ (remo), ‘spade’ and ‘shovel’ (pala). ‘livestock’ and ‘cattle’ (ganado), and ‘mud’ and ‘clay’ (barro). Other such collapsed distinctions are shared by English, such as needle/aguja for both sewing and pine needles, day/día for ‘not night’ and ’24 hours’, believe/creer for ‘trust’ and ‘opine’, and weave/tejer for both fabric and baskets (or hair).
I look forward to sharing the results of my analysis when I’m done. In the meantime, I hope you find the above of interest.