Monthly Archives: July 2022

Something borrowed, something blue

For the last few years I’ve had a research project about Spanish word origins on the back burner. This summer I’ve resurrected the project, and it is simmering nicely: I have now finished the first major stage.

The focus of the project is Spanish borrowings, or loanwords: words in Spanish that originated in other languages. The project applies to Spanish the methodology from Martin Haspelmath and Uri Tadmor’s World Loanword Database (WOLD) project. Beginning in 2004, Haspelmath and Tadmor organized a team of linguists to collect data on loanwords in forty-one languages around the world. In 2009 they published their results in a book, Loanwords in the World’s Languages: A Comparative Handbook (De Gruyter), and the contributing linguists shared their data on the WOLD website.

My goals in this project are:

  1. To compare Spanish to the forty-one languages in the WOLD project, in terms of (i) its percentage of loanwords, and (ii) these words’ characteristics, such as their part of speech.
  2. To quantify the relative contributions of different source languages to Spanish vocabulary. I already did this for my first book, using a random sampling of five hundred words from a standard Spanish etymological dictionary. But that sample may have skewed toward more recherché vocabulary.
  3. To address various issues involved in etymological research, in Spanish and in general.

More about the WOLD project

In order to obtain comparable results across the WOLD languages, all participating linguists started with the same list of 1460 core meanings: ‘house,’ ‘mother,’ ‘go,’ and so on. Each linguist identified ‘their’ language’s words for these meanings, then traced the origins of those words using a standardized set of guidelines. I have now completed the first of these two steps for Spanish. It raised all sorts of interesting issues, which I will discuss in my next blog post.

One goal of the WOLD project was to compare the frequency of borrowing in different languages. In other words, of the core meanings, how many were expressed in each language by loanwords? As shown in the table below, borrowing rates ranged from 1.2% for Mandarin Chinese to 62.7% for Selice Romani. Yaron Matras’s review of the WOLD Handbook in the journal Language points out that these two languages are spoken in diametrically different environments. Speakers of Mandarin “show little or no bilingualism”; the language has “a status as a majority language, a powerful standard, and a sociopolitically dominant population.” In contrast, Selice Romani is associated with “universal multilingualism, a minority language status, the absence of a written standard, and sociopolitical marginalization.”

Romanian, the only Romance language in the project, fell into the “high borrowers” category (25.9% to 45.6%), as did English. My previous research (see above) placed Spanish in the “very high borrowers” category, with roughly one-third “native” vocabulary (from Vulgar Latin), one-third later borrowings from Latin, and one-third words from other languages. It will be interested to see whether this holds up for a WOLD-based lexicon.

Borrowing typeLanguages (in increasing order of % loanwords)
“Low borrowers”
(1.2 – 9.7%)
Mandarin Chinese, Old High German, Manange, Ket
“Average borrowers”
(10.7 – 22.4%)
Otomi, Seychelles Creole, Gawwada, Hug, Oroqen, Hawaiian, Kali’na, Iraqw, Q’eqchi’, Wichí, Zinacantán Tzotzil, Malagasy, Dutch, Kanuri, White Hmong, Mapudungun, Hausa, Lower Sorbian
“High borrowers”
(25.9 – 45.6%)
Takia, Thai, Yaqui, Swahili, Vietnamese, Sakha, Archi, Imbabura Quechua, Kildin Saami, Bezhta, Indonesian, Japanese, Ceq Wong, Sarmaccan, English, Romanian, Gurindji
“Very high borrowers”
(51.7 – 62.7%)
Tarifyt Berber, Selice Romani

Another goal of the WOLD project was to learn more about borrowing in general. The research confirmed several generally accepted principles about borrowings:

  • Function words were borrowed less than content words (nouns, verbs, adjectives, and adverbs). Overall, 12% of function words were borrowed, compared to 25% of content words.
  • Nouns were more likely to be borrowed (31%) than other types of content words (14-15%).
  • Borrowing was most common for cultural vocabulary, such as religion, clothing, housing, law, social and political relations, agriculture, food, and warfare; and least common for personal vocabulary, such as sense perception, spatial relations, body parts, and kinship.


My interest in the WOLD methodology dates from 2018, when I was starting to work on my second book, Bringing Linguistics into the Spanish Language Classroom. The book is organized around five themes, or “essential questions,” including “How is Spanish different from other languages?” and “How is Spanish similar to other languages?” I thought it would be interesting to compare Spanish to the WOLD languages so that I could say either “Spanish has borrowed more words than most other languages” or “Spanish has borrowed a typical amount of words.” (I was confident that Spanish would be a “low borrower.”)

I originally imagined that I could research this topic in a couple of weeks, but soon ran into methodological issues such as:

  • Should word pairs like hijo and hija (‘son/daughter’) be counted as two separate words, even though they are just masculine and feminine forms of the same word?
  • WOLD linguists could identify multiple words for a single meaning. How far should this be taken for Spanish? How does one draw the line between synonyms and dialectal variants?
  • When looking up word origins, the WOLD guidelines count a word as borrowed if it entered the language at any point in the language’s history. This would include, for instance, words borrowed into Classical or Vulgar Latin, such as gato ‘cat.’ (Vulgar Latin cattus is believed to be Afro-Asiatic in origin, and replaced the original Latin feles.) This guideline rubbed me the wrong way. Shouldn’t Spanish begin with Vulgar Latin?

After three months of a futile quick-and-dirty run at these issues, I decided to put the project on my back burner and eventually do a more thorough job that would hopefully yield publishable results. So…here we are.

La esclava blanca

For years I’ve intended to watch a telenovela, or Spanish-language soap opera. Like many people who learned Spanish as a second language, I find that listening is my weakest skill. I figured that sitting through hours of Spanish dialogue would help me.

A few months ago I finally took the plunge and watched La esclava blanca (‘The White Slave’) on Netflix. It was so much fun that I ended up binge-watching all sixty-two episodes.* This was bad for my physical health except for those episodes I watched while exercising. It also wasn’t as good for my listening skills as I had hoped, since I watched it with Spanish subtitles. Now that I know the plot perhaps I should rewatch it without subtitles…but I’d rather move on to a different series.

There was much to admire in La esclava blanca. The cast was terrific, especially the villain, who was played by a handsome Spanish actor with the improbable but delightful name Miguel de Miguel. His character was vile yet undeniably charming. The star-crossed lovers at the center of the plot were brave and bold. Over the course of the series the side characters became more compelling and interesting as they grew and changed, often in surprising ways. Finally, the story’s setting, in Colombia toward the end of that country’s slavery era, was engrossing. The show made it abundantly clear that slavery was a poison in Colombian society, harming not only the enslaved Blacks (obviously) but also their legal owners. As the show progressed, and the slave owners became more and more desperate to protect their way of life, they descended deeper and deeper into pure evil. The ultimate fate of Miguel de Miguel’s character illustrates this path most graphically. You’ll have to watch the show to find out more. Really, his last scene is a doozy.

The one thing that bothered me about the series is that the white heroine and the mixed-race hero, rather than the enslaved people of pure African descent, drove the movement toward liberation. This is an example of what is known as the “white savior” trope in which a white person leads or rescues a minority. Other examples are the movie Glory, which stars Matthew Broderick as the white commander of a Black regiment on the Union side in the Civil War, and Avatar, in which only a brave white human (an ex-Marine played by Sam Worthington) can rescue the blue Na’vi humanoids and their homeland on a verdant moon.

My complaint is not original. A Google search for “esclava blanca white savior” will find many other critiques along these lines.

So while I truly enjoyed this telenovela, the “white savior” issue stops me from recommending it with full enthusiasm.

Of course, I found much of linguistic interest in the series. I don’t know to what extent the features noted below are specific to Colombian Spanish.

  • First and foremost, I am convinced that I heard some instances of words whose initial h was aspirated rather than silent. Two I wrote down, both in episode 33, were Qué va, hombre (shortly after 17:00) and Hola, Jesús (after 36:30). I’ve searched but haven’t found this described anywhere as a feature of Colombian Spanish.
  • As with the n-word in English, the white and Black characters in La esclava blanca use negro/negra differently. For the whites it is an insulting noun, often followed by the adjective asqueroso ‘disgusting.’ For the Blacks it is affect-free, like man or bro in English.
  • Speaking of man, the Black characters also use hombre when talking with pals. (I don’t remember whether the white characters do this too.) At 29:20 in episode 57, Julián even calls his girlfriend hombre, which amused me.
  • The actors frequently drop the word a in sentences like Miguel va a comer ‘Miguel is going to eat,’ saying instead Miguel va comer. This makes perfect sense: the adjacent a vowels in va and a have simply blended. A is retained with other verb forms, as in Miguel y Elena van a comer.
  • At 13:20 in episode 59 a character says Usted verá que no es lejos. I noticed other uses of ser instead of estar to describe location.
  • At 43:45 in episode 61 I learned a new verb, engatusar, meaning ‘to con, deceive.’ According to Juan Corominas it has an unusual etymology that blends three roots: encantusar ‘to deceive with witchcraft,’ engatar (from gato) ‘to deceive with affection,’ and engaratusar ‘to deceive with praise.’
  • Finally, while I have yet to realize my ambition of visiting a voseísta country like Colombia (i.e. one whose speakers use the pronoun vos instead of (or along with) ), I really enjoy hearing voseo! In La esclava blanca I especially relished commands followed by pronouns, since these are identical to their equivalents except for the stressed syllable. Some examples are perdoname (instead of perdóname) and tranquilizate (instead of tranquilízate), whose te threw me for a loop until I checked the conjugation.

As always, I welcome comments. I would especially appreciate hearing from anyone who is familiar with Colombian Spanish.


*When I started watching the show I had no idea that it was so long. At first I figured that it would be fairly short because an important wedding was scheduled to take place in a few days. When the wedding kept being postponed I checked and saw that I still had dozens of episodes to go. By then it was too late to stop watching: I was thoroughly hooked.