Tag Archives: WOLD

Recent gleanings from David Crystal

As a first step in research for my new book I have continued to work my way through David Crystal’s inspirational The Story of English in 100 Words. I hit the 25-word mark yesterday, and thought this would be a good time to blog about what I’ve learned since my previous post, which covered Crystal’s Introduction and his first few words.

To begin with, I’ve learned lots of fun facts about my own language! Here are a few:

  • The word out serves as a verb, adverb, exclamation, preposition, adjective, and noun. Wow!
  • The word street was one of the first borrowings into English from Latin. It was applied to the paved, straight roads that the Romans built, while the earlier Anglo-Saxon word weg (now ‘way’) was relegated to older paths.
  • The groom of bridegroom was originally guma, a somewhat poetic Old English word for ‘man’. Speakers substituted the similar-sounding groom when guma dropped out of normal usage. Groom originally meant ‘boy’ but had acquired its current equestrian meaning by the time of this substitution.
  • The only cookery words that come from Old English, not French, are grind and dough. Who knew?
  • Many legal expressions like goods and chattels, fit and proper, and will and testament, originally combined Germanic and Latin words so that they would be widely understood.
  • The game of Monopoly caused American jail (apparently descended from Parisian French) to overtake the (apparently Norman) gaol outside of the U.S.
  • Middle English didn’t have separate words for ‘spring’ and ‘summer’, but merged them both into sumer, as in the famous English round (song) Sumer is icumen in, Lhude sing cuccu.
  • We think of wee as Scottish, but it originated in Northern England, not over the border in Scotland.

—————–

As I’d hoped, Crystal’s book is giving me interesting ideas for my own book about Spanish, or at least questions:

  • Does Spanish have phrases with deliberately bilingual origins, like goods and chattels in English?
  • What word can I discuss that is only found in legal Spanish? (Maybe the future subjunctive…)
  • Crystal includes the title dame. What Spanish title should I discuss?
  • I should definitely include some Spanish word pairs that consist of a newer word and an earlier one from a different source (as in street and way). Some possibilities are abarca/sandalia, simiente/semilla, vianda/comida, and/or hostal/hotel. Of course, sometimes a new borrowing completely overwrites an earlier word, for example French té, which ousted Portuguese cha (itself borrowed from Mandarin).
  • Apparently pork meant ‘penis’ in American slang of the 1930s. Are any Spanish slang words for body parts worth discussing?
  • Speaking of slang words for body parts, Crystal points out that cunt is taboo enough to sometimes be referred to as “the c-word”, which can also mean ‘cancer’. English also has the “f-word” and the “n-word”. Does Spanish have any such words that, like Voldemort, must not be named? I will include some taboo words in any case.
  • The lack of separate words for ‘spring’ and ‘summer’ in early English reminds me of the lack of modern Spanish words for ‘evening’ and ‘night’. Were there earlier Spanish uber-terms that eventually split into two words? (And what can I call them instead of “uber-terms”, which I just invented? Portmanteau means something else, doesn’t it?) Did this happen, in recorded history, with color terms, which tend to proliferate as languages evolve? (See e.g. Guy Deutcher’s Through the Language Glass, one of my favorite linguistics books written for a general audience.)
  • English slang frequently uses negative words as positives, e.g. wicked, mad, insane, crazy, and of course bad. Is there anything like this in Spanish?
  • Crystal’s discussion of phrasal verbs like take out and take away, which are distinctively English (perhaps Germanic, more generally?) reminded me that I should definitely discuss pronominal verbs (reflexive and otherwise), which are distinctively Spanish.
  • Crystal says out that the word count, as opposed to countess, was initially avoided because its earlier pronunciation was similar to that of cunt. This reminded me of Tom Lathrop’s assertion that the verb jugar, originally jogar, ‘evolved’ its u to make the verb sound less like joder. Is this assertion taken seriously? Are there other examples of this avoidance process in the history of Spanish?

Unrelated to Crystal’s book, I’ve also made notes to myself to include

  • Words that illustrate spelling changes, like saqué or empecemos. Maybe the best way to do this is with a word like cero or cebra.
  • Diminutives, aggrandatives, and so on. Spanish has a wealth of derivational suffixes! Likewise I should include interesting prefixes such as re- prefix, which can be an intensifier (rebueno) or a repeater (rehacer).
  • The full range of parts of speech.
  • Words from a broad range of semantic categories such as the 24 used in the WOLD project.

On a final note, the New York Times “Connections” game, which I play every morning — my current streak is 69! — recently included the word loanword. Many commenters on the Connections blog complained that this word was too obscure. So did my husband. NOT ME!!!

Something borrowed, something blue

For the last few years I’ve had a research project about Spanish word origins on the back burner. This summer I’ve resurrected the project, and it is simmering nicely: I have now finished the first major stage.

The focus of the project is Spanish borrowings, or loanwords: words in Spanish that originated in other languages. The project applies to Spanish the methodology from Martin Haspelmath and Uri Tadmor’s World Loanword Database (WOLD) project. Beginning in 2004, Haspelmath and Tadmor organized a team of linguists to collect data on loanwords in forty-one languages around the world. In 2009 they published their results in a book, Loanwords in the World’s Languages: A Comparative Handbook (De Gruyter), and the contributing linguists shared their data on the WOLD website.

My goals in this project are:

  1. To compare Spanish to the forty-one languages in the WOLD project, in terms of (i) its percentage of loanwords, and (ii) these words’ characteristics, such as their part of speech.
  2. To quantify the relative contributions of different source languages to Spanish vocabulary. I already did this for my first book, using a random sampling of five hundred words from a standard Spanish etymological dictionary. But that sample may have skewed toward more recherché vocabulary.
  3. To address various issues involved in etymological research, in Spanish and in general.

More about the WOLD project

In order to obtain comparable results across the WOLD languages, all participating linguists started with the same list of 1460 core meanings: ‘house,’ ‘mother,’ ‘go,’ and so on. Each linguist identified ‘their’ language’s words for these meanings, then traced the origins of those words using a standardized set of guidelines. I have now completed the first of these two steps for Spanish. It raised all sorts of interesting issues, which I will discuss in my next blog post.

One goal of the WOLD project was to compare the frequency of borrowing in different languages. In other words, of the core meanings, how many were expressed in each language by loanwords? As shown in the table below, borrowing rates ranged from 1.2% for Mandarin Chinese to 62.7% for Selice Romani. Yaron Matras’s review of the WOLD Handbook in the journal Language points out that these two languages are spoken in diametrically different environments. Speakers of Mandarin “show little or no bilingualism”; the language has “a status as a majority language, a powerful standard, and a sociopolitically dominant population.” In contrast, Selice Romani is associated with “universal multilingualism, a minority language status, the absence of a written standard, and sociopolitical marginalization.”

Romanian, the only Romance language in the project, fell into the “high borrowers” category (25.9% to 45.6%), as did English. My previous research (see above) placed Spanish in the “very high borrowers” category, with roughly one-third “native” vocabulary (from Vulgar Latin), one-third later borrowings from Latin, and one-third words from other languages. It will be interested to see whether this holds up for a WOLD-based lexicon.

Borrowing typeLanguages (in increasing order of % loanwords)
“Low borrowers”
(1.2 – 9.7%)
Mandarin Chinese, Old High German, Manange, Ket
“Average borrowers”
(10.7 – 22.4%)
Otomi, Seychelles Creole, Gawwada, Hug, Oroqen, Hawaiian, Kali’na, Iraqw, Q’eqchi’, Wichí, Zinacantán Tzotzil, Malagasy, Dutch, Kanuri, White Hmong, Mapudungun, Hausa, Lower Sorbian
“High borrowers”
(25.9 – 45.6%)
Takia, Thai, Yaqui, Swahili, Vietnamese, Sakha, Archi, Imbabura Quechua, Kildin Saami, Bezhta, Indonesian, Japanese, Ceq Wong, Sarmaccan, English, Romanian, Gurindji
“Very high borrowers”
(51.7 – 62.7%)
Tarifyt Berber, Selice Romani

Another goal of the WOLD project was to learn more about borrowing in general. The research confirmed several generally accepted principles about borrowings:

  • Function words were borrowed less than content words (nouns, verbs, adjectives, and adverbs). Overall, 12% of function words were borrowed, compared to 25% of content words.
  • Nouns were more likely to be borrowed (31%) than other types of content words (14-15%).
  • Borrowing was most common for cultural vocabulary, such as religion, clothing, housing, law, social and political relations, agriculture, food, and warfare; and least common for personal vocabulary, such as sense perception, spatial relations, body parts, and kinship.

Motivation

My interest in the WOLD methodology dates from 2018, when I was starting to work on my second book, Bringing Linguistics into the Spanish Language Classroom. The book is organized around five themes, or “essential questions,” including “How is Spanish different from other languages?” and “How is Spanish similar to other languages?” I thought it would be interesting to compare Spanish to the WOLD languages so that I could say either “Spanish has borrowed more words than most other languages” or “Spanish has borrowed a typical amount of words.” (I was confident that Spanish would be a “low borrower.”)

I originally imagined that I could research this topic in a couple of weeks, but soon ran into methodological issues such as:

  • Should word pairs like hijo and hija (‘son/daughter’) be counted as two separate words, even though they are just masculine and feminine forms of the same word?
  • WOLD linguists could identify multiple words for a single meaning. How far should this be taken for Spanish? How does one draw the line between synonyms and dialectal variants?
  • When looking up word origins, the WOLD guidelines count a word as borrowed if it entered the language at any point in the language’s history. This would include, for instance, words borrowed into Classical or Vulgar Latin, such as gato ‘cat.’ (Vulgar Latin cattus is believed to be Afro-Asiatic in origin, and replaced the original Latin feles.) This guideline rubbed me the wrong way. Shouldn’t Spanish begin with Vulgar Latin?

After three months of a futile quick-and-dirty run at these issues, I decided to put the project on my back burner and eventually do a more thorough job that would hopefully yield publishable results. So…here we are.