Glossary¶
This is a glossary of terminology as used in the intelligent dictionary app. This is a combination of operational terms used within the dictionary code, general linguistic terms, and terminology used by specific approaches to describing certain languages.
Terms¶
analysis¶
also, linguistic analysis or linguistic breakdown.
An ordered set of the lemma and morphosyntactic features that can describe an inflected wordform.
It minimally consists of:
at least one lemma
at least one feature, stating the wordform’s word class
Example¶
One possible linguistic analysis of the wordform “sabía” in Spanish is:
saber+V+Past+1Sg
In other words, the breakdown is:
It’s a form of saber (the lemma)
It’s a verb
It’s past-tense
It’s actor is first-person, singular
Contains¶
1 or more lemmas
1 or more morphosyntactic features
Describes¶
1 wordform; note, a single wordform can have multiple distinct analyses.
conjugation¶
A type of inflectional category for the verb word class.
conjugator¶
(informal) a tool that generates a paradigm.
N.B.: people ask for a conjugator, even when asking to generate noun wordform!
declension¶
A type of inflectional category for the noun word class.
definition¶
One of possibly several meanings of the head.
Part of¶
Describes¶
1 head
derivational breakdown¶
A derivational breakdown of a wordform contains different morphemes that makes up the wordform.
Example¶
atahkw + is + iw is the derivational breakdown of acâhkosiwiw
star + let + ify is the derivational breakdown for the coined English word startletify (to make something a little star)
derivational paradigm¶
The collection of all possible derived forms belonging to a lemma.
Part of¶
1 lemma
Contains¶
1 or more wordforms
derived form¶
A new wordform created from a lemma; this new wordform has a separate lemma with its own inflectional paradigm. A derived wordform can belong to a different word class than the original source stem.
Part of¶
dictionary¶
???
dictionary entry¶
The main content of a dictionary. Consists of the head (in one or more orthographical representations), the word class, and the definitions.
Part of¶
Contains¶
1 head
1 or more definitions
1 word class, if the head is a word form
See also¶
dictionary source¶
An edited repository of dictionary entries. A dictionary source has at least one of the following:
an editor/editors
an author/authors
A dictionary sources provides at least one or more dictionary entries.
A dictionary source may have other bibliographic metadata, like a book or a publication.
indeclinable particle¶
(In Plains Cree linguistics) The word class of terms that do not inflect. Often abbreviated as Ipc.
Is a¶
inflectional category¶
A more detailed categorization of a word class. Things that belong to the same inflection category have the same affix set.
Examples¶
NI-1
VTA-n
NDA-4w
⚠️ A deprecated synonym exists — This was formerly also called an inflectional class, but that term is now deprecated.
inflectional paradigm¶
The collection of inflected wordforms belonging to a lemma. Informally known as the conjugations.
Part of¶
1 lemma
Contains¶
1 or more wordform
general word class¶
Superclass of word class. Does not contain inflectional categories.
General word classes are are not detailed enough to tell you how its members inflect. A word class, on the other hand, tells you enough to be able to inflect.
Consists of¶
1 or more word classes
In Plains Cree¶
gloss¶
Note: use translation instead!
Sometimes a sloppy synonym for translation. More specifically, a gloss is a one-to-one mapping between one language and another, often accompanied by relevant tags for morphosyntactic features. Glosses are more specific and less “fluent” than a translation.
head¶
The highest level structure of a dictionary. Each head is listed alphabetically (with derivations (phrases on the wordform) coming after the ‘root’ listing).
inflected form¶
???
lacuna¶
“Gaps” in a paradigm. Any form that does not exist in a paradigm. For example, the English word “pants”:
Singular |
Plural |
|
---|---|---|
— |
pants |
Pants doesn’t have a singular form! There’s “pant leg”, but no “*pant” This is a lacuna: a gap in the paradigm, where you would otherwise expect a valid form.
language pair¶
Each dictionary gives target language definitions for entries written in a specific source language.
The specific combination of source and target languages in a dictionary application is called a language pair, e.g., the language pair of Plains Cree to English for itwêwina.
In filenames and throughout the morphodict code, to distinguish between
different dictionary applications, the abbreviation sssttt
is used, where
sss
and ttt
are the 3-character ISO 639-3 language codes for the
source language and target language
respectively, of the dictionary.
For example, for the morphodict Plains Cree-to-English dictionary this is
crkeng
, and many code and data file paths will contain the string
crkeng
. For example, you will find files such as the test dictionary at
the path ../src/crkeng/resources/dictionary/crkeng_test_db.importjson
lemma¶
The base form of a word form; this is a form chosen to depict the basic representation of the paradigm. Often the least structurally and semantically marked form. Unlike a stem or root, a lemma is always a valid word form.
In a dictionary, the definitions of a lemma implicitly provide definitions for the inflected forms of the lemma.
If a term is defined in a dictionary, its head will be the lemma. e.g., you might not find a definition for “smartphones” in a dictionary of contemporary English; instead, you’ll find a definition for “smartphone” (the lemma), and “smartphones” is one of its inflected forms. However, non-lemma wordforms may also be heads in a dictionary, depending on context.
Whether non-lemma wordforms can have their own definitions is sometimes a point of controversy among linguists. Some would argue that providing a distinct definition for a non-lemma wordform implies that it is its own lexeme. But the counter-argument is that specific wordforms in a lexeme can have their own connotations, especially in morphologically complex languages, and not all of these connotations are necessarily distinct enough to create an entirely new lexeme.
morphodict does support having definitions for non-lemma wordforms.
Part of¶
lexeme¶
A related set of wordforms.
Other sources may also call this a lexical entry or lexical item.
meaning¶
???
morpheme¶
An indivisible part of language with meaning; A morpheme cannot be broken down into any subsequent parts, without changing its meaning.
morphosyntactic feature¶
???
multicharacter symbol¶
In LEXC, a symbol in the FST’s alphabet that is realized in text form
as multiple Unicode characters. These are used for tags, e.g., +V
,
+TA
, +Err/Orth
; and special symbols used in phonological rules,
e.g., the t2
in nit2<nipa>n
.
Note to FST implementors: since tags are always multicharacter symbols, if the FST output has all the symbols separated, then there is no need to parse the analysis to find tags.
For example, “nêpât” is transduced to the following ten symbols
(separated by |
):
IC+ | n | i | p | â | w | +V | +AI | +Cnj | +3Sg
normatize¶
Write things according to the orthographical norm. A norm is implicitly and unconsciously decided by a community of writers. To normalize the spelling of something is to make it match the spelling expected by a community. A language may have many norms.
See also: standardize
e.g., the normative form of “alot” is “a lot”
e.g., the normative form of “icecream” is “ice cream”
e.g., the normative form of “atchakosuk” is “acâhkosak”
orthographical representation¶
???
paradigm layout¶
A formal specification that describes how to arrange (in a table) the inflections or derived wordforms of any lexemes belonging to a particular word class; or, how to arrange related wordforms in a table.
Subtypes:
- dynamic paradigm layout
- paradigm layout that has placeholders for the lemma or other morphosyntactic information that may be replaced when generating a rendered paradigm. These are the types of paradigm layouts used when describing an entire word class.
- static paradigm layout
- paradigm layout in which all forms are explicitly specified; there are no placeholders
part of speech¶
⚠️ Deprecated — use word class instead.
The grammatical category to which a term belongs. Different parts of speech have different functions in a clause.
Part of¶
1 or more word class
1 term
phrase¶
Multiple word forms that, together, have one meaning. A dictionary entry may use a phrase as a head.
Is composed of¶
2 or more word forms
Can be a¶
1 head
root¶
The smallest form of a term (a morpheme) from which all inflected forms are based off of. The root might not be a valid wordform.
For example, in English, childr- is the root of child and children.
In Plains Cree¶
*atimw- is the root of the lemma atim, however, it is not a valid wordform on its own. It can be inflected to create atim and atimwak.
mow- is the root of the lemma mowêw, and it also happens to be a valid inflected form of mowêw (an imperative form)
source language¶
In a unidirectional bilingual dictionary, the language of the head words.
Example: in Cree: Words, which gives a list of Cree head words with all definitions being English translations, the source language is Cree.
See also: target language.
standardize¶
Write things according to the orthographical standard. A standard is explicitly and consciously decided by an individual or body to be adopted by a greater community. A language may have many standards, or it might have no standard orthography. When there is one widely-adopted standard, then it is also the norm: then “standardize” and “normative” are synonymous.
See also: normatize
tag¶
A multicharacter symbol that represents a linguistic feature.
In Plains Cree¶
In the Plains Cree FST, these tags either end with a +
for prefixes (e.g.,
PV/e+
, or start with +
sign for everything else (e.g., +N
, +TA
,
+V
).
General word class:
+V
,+N
,+Ipc
,+Prop
Word class
+TA
,+TI
,+VI
,+I
,+A
Whether a noun is dependent:
+D
Tense:
+Prs
,+Fut
,+Prt
(really, denotes which tense preverb exists)Order:
+Ind
,+Cnj
Subject:
+1Sg
,+3Pl
,+4Sg/Pl
,+5Sg/Pl
Object:
+1SgO
,+3PlO
,+4Sg/PlO
The possessor of a noun:
+Px1Sg
,+Px2Sg
,+Px4Sg
Preverbs:
PV/e+
,PV/kaa+
Reduplcation:
RdplW+
,RdplS+
and many more!
See this document for more info: https://giellalt.uit.no/lang/crk/crk.html
target language¶
In a unidirectional bilingual dictionary, the language of the definitions.
Example: in Cree: Words, which gives a list of Cree head words with all definitions being English translations, the target language is English.
See also: source language.
translation¶
A definition written in a different language than the head it is defining.
user query¶
also query, search string.
How the user writes their search intent, as a series of Unicode code points. This might be a messy, misspelled, strangely written string. It is the job of the intelligent dictionary to take this wild thing and make sense of it, returning results that satisfy the user’s search intent.
word class¶
Category of a set of terms that inflect in a similar way. Members of the same word class behave morphologically in a similar way to each other.
Contains¶
1 or more inflectional categories.
in Plains Cree¶
These are the word classes in Plains Cree:
NA: 🧑🏽 — animate noun
NI: 📘 — inanimate noun
NAD: 👤🧑🏽 — dependent animate noun
NID: 👤📘 — dependent inanimate noun
VII: 📘➡️ — intransitive inanimate verb
VAI: 🧑🏽➡️ — intransitive animate verb
VTI: 🧑🏽➡️📘— transitive inanimate verb
VTA: 🧑🏽➡️🧑🏽— transitive inanimate verb
More specific categorizations inside a word class are inflectional categories such as NI-1.
⚠️ A deprecated synonym exists — This was formerly also called a specific word class, but that term is now deprecated.
wordform¶
In linguistics, the different ways that a word can exist in a language. (Not to be confused with lemma – which is its own special type of wordform). A wordform must be able to exist by itself. Contrast this to morpheme and phrase.
stem¶
In linguistics, please use the term root instead.
In natural language processing and information retrieval, the stem is a potentially garbled form of the input term that aids in indexing a large number of related terms. Typically this involves using naïve heuristics to remove both inflectional and derivational affixes from the input term. The stem does not have to be linguistically meaningful, and the stem is often not a valid wordform.
For example, “connection” can be stemmed to “connect” using the Porter stemming algorithm.
Naïve stemming heuristics can be replaced with a linguistic analyzer that is able to return the lemma of a term, however, this is not available for every language, and may not be necessary to create a satisfactory information retrieval system.
term¶
???