Skip to content

feat: add selectable multilingual spelling dictionaries#657

Open
FuJacob wants to merge 1 commit into
mainfrom
codex/multilingual-spelling-dictionaries
Open

feat: add selectable multilingual spelling dictionaries#657
FuJacob wants to merge 1 commit into
mainfrom
codex/multilingual-spelling-dictionaries

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

  • Bundle licensed SymSpell frequency dictionaries for German, Spanish, French, Hebrew, Italian, and Russian, with source notices, full license texts, provenance, and checksums. English remains the default.
  • Add Writing settings that let users enable any installed dictionary or disable bundled dictionaries entirely.
  • Route each typo to at most one enabled language using surrounding-text detection, load indexes lazily, retain at most two with LRU eviction, and fall back to NSSpellChecker for ambiguous, cold, or unmatched cases.

Validation

  • swiftlint lint --config .swiftlint.yml --quiet
  • xcodebuild ... build-for-testing ... CODE_SIGNING_ALLOWED=NO
  • 36 focused tests passed: language resolution, SymSpell cache/isolation, settings persistence, and bundled-resource validation.
  • Verified all seven dictionary resources are present in the built app and manually exercised representative corrections in each language.
  • The CI-equivalent full suite passed every test before stalling in the pre-existing ScreenshotContextGeneratorTests.test_generateContext_allNoiseOCRReturnsUnavailable; an independent main-worktree run was stalled at the same test.

Linked issues

None.

Risk / rollout notes

  • The six new dictionaries add about 10 MB to the repository/app resources.
  • Runtime memory stays bounded because only requested languages load and the cache retains at most two indexes.
  • Chinese is intentionally excluded because upstream does not identify the generated file source/license and Cotabby does not yet provide reliable non-whitespace word segmentation.
  • Existing users default to English; an explicit empty selection is preserved as system-spell-checker-only mode.

Greptile Summary

This PR bundles six new SymSpell frequency dictionaries (German, Spanish, French, Hebrew, Italian, Russian) alongside their license files, and adds a settings UI for enabling them. A new SpellingLanguageResolver uses Apple's NLLanguageRecognizer to pick the right index from surrounding text when multiple dictionaries are enabled; SymSpellCorrector loads indexes lazily and evicts the least-recently-used one when a two-entry cache limit is reached.

  • Core language-selection logic (SpellingLanguageResolver, SymSpellCorrector) is clean and well-tested: lazy loading, LRU eviction, per-language cache isolation, and NSSpellChecker fallback all work correctly.
  • Settings wiring (SpellingDictionaryCatalog, SuggestionSettingsStore, SuggestionSettingsModel) correctly normalizes, persists, and migrates the new enabledSpellingDictionaryCodes preference alongside the existing typo-correction toggles.
  • Spanish dictionary resource is named es-100l (letter ℓ) while every other dictionary follows the *-100k convention — the mismatch is inherited from upstream but stands out to maintainers reading the bundle.

Confidence Score: 4/5

Safe to merge; all new language-routing, lazy-loading, and LRU-eviction logic is correct and well-tested.

The core SymSpell cache, language resolver, settings persistence, and NSSpellChecker fallback are all implemented correctly. The dictionary picker is always interactive even when the parent "Offer Corrections on Typo" toggle is off, which can mislead users into enabling dictionaries that have no runtime effect. The Spanish resource name inconsistency is inherited from upstream but worth a one-line comment.

Cotabby/UI/SpellingDictionaryPicker.swift — picker interactivity should be gated on offerTypoCorrections; Cotabby/Models/SpellingDictionaryCatalog.swift — Spanish resource name anomaly.

Important Files Changed

Filename Overview
Cotabby/Services/Spelling/SymSpellCorrector.swift Rewired to support multiple language indexes via an LRU cache; lazy background loading, lock discipline, and eviction logic are all correct
Cotabby/Support/SpellingLanguageResolver.swift New file; single-language fast path and multi-language NL recognition with a conservative 0.55 confidence threshold are well-reasoned
Cotabby/Models/SpellingDictionaryCatalog.swift New catalog model; normalization and stable ordering are correct, but Spanish resource name es-100l diverges from the *-100k naming pattern used by every other language
Cotabby/UI/SpellingDictionaryPicker.swift New settings UI; toggles are always interactive even when the parent "Offer Corrections on Typo" setting is disabled, so enabled dictionaries silently have no effect
Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift Language-aware bestCorrection routing and NSSpellChecker fallback logic are clean and correctly integrated with TypoGate
Cotabby/Support/SuggestionSettingsStore.swift New spellingDictionaryCodesDefaultsKey key and saveEnabledSpellingDictionaryCodes are correctly wired; normalize-on-save is slightly redundant but harmless
Cotabby/App/Core/CotabbyAppEnvironment.swift Preload-language selection logic (prefer English, fall back to sole-enabled language, skip preload for broad multilingual sets) is correct
CotabbyTests/SymSpellCorrectorTests.swift Good coverage of lazy-load nil, correction, capitalization transfer, dictionary isolation, and LRU eviction

Sequence Diagram

sequenceDiagram
    participant SC as SuggestionCoordinator
    participant TG as TypoGate
    participant SLR as SpellingLanguageResolver
    participant SSC as SymSpellCorrector
    participant NSS as NSSpellChecker

    SC->>TG: handleTypoGate(rawContext, workID)
    TG->>SC: bestCorrection(word, precedingText)
    SC->>SLR: resolve(precedingText, word, enabledLanguages)
    alt single language enabled
        SLR-->>SC: language (explicit selection)
    else multiple languages, confident NL detection
        SLR-->>SC: language (from NLLanguageRecognizer)
    else ambiguous / empty
        SLR-->>SC: nil
        SC->>NSS: bestCorrection(word)
        NSS-->>SC: correction or nil
    end
    alt language resolved
        SC->>SSC: bestCorrection(word, language)
        alt index cached
            SSC-->>SC: correction or nil
        else index loading / not ready
            SSC-->>SC: nil (triggers background load)
            SC->>NSS: bestCorrection(word)
            NSS-->>SC: correction or nil
        end
    end
    TG->>SC: .correct(word, correctedWord) or .suppress or .proceed
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "feat: add multilingual spelling dictiona..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Comment on lines +24 to +42
VStack(alignment: .leading, spacing: 8) {
ForEach(SpellingDictionaryLanguage.allCases) { language in
Toggle(
language.settingsLabel,
isOn: Binding(
get: {
suggestionSettings.isSpellingDictionaryEnabled(language)
},
set: {
suggestionSettings.setSpellingDictionary(
language,
enabled: $0
)
}
)
)
.toggleStyle(.checkbox)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The dictionary picker is always interactive regardless of whether typo corrections are actually being offered. SymSpellCorrector is only consulted when offerTypoCorrections is true (gated via TypoGate in handleTypoGate), so a user who has "Offer Corrections on Typo" turned off can tick every dictionary checkbox with no observable effect. A .disabled(!suggestionSettings.offerTypoCorrections) modifier on the VStack makes this dependency explicit in the UI, matching how the "Offer Corrections on Typo" toggle itself is already disabled when its own parent toggle is off.

Suggested change
VStack(alignment: .leading, spacing: 8) {
ForEach(SpellingDictionaryLanguage.allCases) { language in
Toggle(
language.settingsLabel,
isOn: Binding(
get: {
suggestionSettings.isSpellingDictionaryEnabled(language)
},
set: {
suggestionSettings.setSpellingDictionary(
language,
enabled: $0
)
}
)
)
.toggleStyle(.checkbox)
}
}
VStack(alignment: .leading, spacing: 8) {
ForEach(SpellingDictionaryLanguage.allCases) { language in
Toggle(
language.settingsLabel,
isOn: Binding(
get: {
suggestionSettings.isSpellingDictionaryEnabled(language)
},
set: {
suggestionSettings.setSpellingDictionary(
language,
enabled: $0
)
}
)
)
.toggleStyle(.checkbox)
}
}
.disabled(!suggestionSettings.offerTypoCorrections)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

switch self {
case .english: return "frequency_dictionary_en_82_765"
case .german: return "de-100k"
case .spanish: return "es-100l"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The Spanish resource name es-100l (letter l) diverges from the *-100k pattern every other language follows (de-100k, fr-100k, he-100k, it-100k, ru-100k). The actual file on disk is also es-100l.txt, so this works correctly at runtime, but anyone expecting es-100k when browsing the bundle or the resource directory will look for the wrong file. A comment on the case would prevent future confusion.

Suggested change
case .spanish: return "es-100l"
// Upstream SymSpell file uses "l" (not "k") — matches the filename es-100l.txt in the bundle.
case .spanish: return "es-100l"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant