General presentation of the BRAID-Acq model

The BRAID-Acq model is a probabilistic computational model of self-teaching and reading acquisition that simulates incidental orthographic learning tasks. It is based on a single-route architecture that is able to read both known and novel words using an analogy-based procedure, relying solely on lexical knowledge, without featuring grapheme-to-phoneme conversions. This reading procedure is enabled by a visuo-attentional sub-model that gathers visual information without aligning to predefined psycholinguistic units. Its exploration mechanism simulates the successive positions of the model by maximizing the intake of visual and phonological information, rather than processing each grapheme individually. Additionally, a phonological attentional sub-model, coupled with the visual counterpart, roughly aligns orthographic and phonological positions without relying on graphemic segmentation of the word. Its extended learning mechanism allows it to acquire and update both orthographic and phonological lexical knowledge. Finally, the model integrates a simplified representation of semantic context: the context favors the identification of certain words throughout the entire processing and enables the correction of pronunciation for phonologically familiar words, without compromising the pronunciation of phonologically unfamiliar words (pseudo-words). The source code of the BRAID-Acq model is available here, and my thesis manuscript in French is available here.

First fixation when the model decodes the word VILLE

Specificities of the BRAID-Acq model

An analogy-based reading mechanism

The reading procedure of the BRAID-Acq model is best described as "analogy-based" because it relies exclusively on lexical knowledge, without incorporating abstract knowledge of the spelling-sound relationship.

Handling the variety of incidental learning situations without external supervision

The BRAID-Acq model can learn orthographically novel words without knowing whether their phonological forms are familiar.

A context mechanism

The context of the BRAID-Acq model effectively facilitates reading for words with prior phonological knowledge while avoiding lexicalization of words without prior phonological knowledge.

A top-down mechanism

In the BRAID-Acq model, reading is facilitated for words with prior phonological knowledge, even in the absence of context, thanks to a top-down mechanism.

Results to support our hypothesis

The BRAID-Acq model is able to read correctly most French words, and is able to manage all incidental learning situations.

The Analogy-Based Reading Mechanism of the BRAID-Acq Model

Hypothesis

We hypothesize that both known and novel words engage the same components of the model, contrasting with the theoretical framework of dual-route models (Pritchard et al., 2018; Ziegler et al., 2014). Specifically, we suggest that both known and novel words can be read using the same set of lexical knowledge. Additionally, we explore the hypothesis that segmenting a novel word into graphemes is unnecessary for accurate reading; instead, visuo-attentional fixations could allow for processing the novel word in smaller parts. Thus, we propose that processing novel words does not depend on any fixed psycholinguistic unit.

Description of the mechanism

In the BRAID-Acq model, the reading procedure is both lexical, meaning it uses lexical knowledge, and sub-lexical, meaning it involves processing the stimulus into smaller parts. Reading a novel word involves several successive fixations. During each fixation, the model selects words that are orthographically similar to the stimulus at the portion currently being processed. The model then uses the phonological representations of these selected words to calculate a likely pronunciation.

One task, diverse situations: managing incidental learning without supervision

During incidental learning, readers encounter both words with and without prior phonological knowledge. Accurately learning both types of words without knowing if their pronunciation is familiar beforehand is a challenge for both readers and computational models. For example, when encountering "debt," a reader is likely to decode it as /dɛbt/, while the actual pronunciation is /dɛt/. In attempting to match the pronunciation with a phonologically familiar word, the reader might associate the orthographically novel word with "det." But now, imagine a novice English reader decoding the pseudo-word "shep" as /SEp/. The reader must decide whether to associate it with an existing phonological form like /SEIp/ ("shape") or treat it as completely novel, necessitating the creation of new orthographic and phonological representations for "shep" and /SEp/. This decision step poses a significant challenge for computational models because there is no easy way to differentiate between the two situations: in both cases, the pronunciation matches no entry in the lexicon. Therefore, how can one learn phonologically new words while remaining flexible enough to accommodate words with prior phonological knowledge?

Hypothesis

Unlike the choices made by two computational self-teaching models (Pritchard et al., 2018; Ziegler et al., 2014), and in accordance with behavioral data (Nation & Cocksey, 2009), we propose that knowledge of a word's phonological form and the presence of context are not essential for learning. However, these elements are essential to be able to manage the variety of incidental learning situations.

How to handle uncertainty with its semantic submodel ?

We believe context to be the most efficient way to handle the fact that learning involves learning both words with and without prior phonological knowledge.

Hypothesis

In accordance with the self-learning theory, we hypothesize that context should: 1) guide reading towards a set of plausible words without favoring a specific lexical item, and 2) suggest rather than constrain, meaning that learning possibilities are not limited to words suggested by contextual cues.

Description of the mechanism

Contrary to dual-route models of reading acquisition, in the BRAID-Acq model, context plays a role throughout the entire processing phase. During perceptual processing, context has two effects. First, the model prioritizes the identification of words belonging to the context. Second, it impacts phoneme identification, a process we could term "pronunciation correction." This mechanism is used while reading words with and without prior phonological knowledge. Only words that are both 1) part of the semantic context and 2) whose phonological representations closely match the phonemes identified during decoding, contribute to this pronunciation correction. Ultimately, when the stimulus belongs to the context and decoding is only partially incorrect, then an online mechanism gradually corrects decoding errors, resulting in accurate pronunciation.

How to handle uncertainty even in the absence of context?

Real-life situations also involve reading without meaningful context. The model BRAID-Acq also implements a mechanism to support novel word learning in this case.

Hypothesis

Phonological lexical knowledge supports the model's reading process in a "top-down" manner (from lexicon to phonemes).

Description of the mechanism

The model includes a top-down lexical mechanism modulated by phonological lexical familiarity: the more certain the model is that the stimulus belongs to the phonological lexicon, the more phonological lexical knowledge supports phoneme identification. Specifically, when the model "hesitates" between two phonemes, if the stimulus is evaluated as sufficiently familiar to trigger top-down lexical processing, this ultimately enables the model to make a correct decision.

Results to support our hypothesis

Simulations with the BRAID-Acq model have demonstrated the ability to read novel words without relying on graphemic segmentation and grapheme-phoneme conversions. Even in the absence of context, top-down lexical feedback allows the model to correct some pronunciation errors. Moreover, when context is present, it significantly improves the model's accuracy in reading words with prior phonological knowledge, while maintaining pronunciation accuracy for words without prior phonological associations. Thus, the model effectively handles a variety of situations!