El-Sozduk Language Data

Structured Kyrgyz Corpora for AI, Search, and Language Technology

El-Sozduk develops structured Kyrgyz language corpora as commercial data products for AI, multilingual search, lexicography, and language technology.

What It Is

El-Sozduk is the largest online dictionary resource for the Kyrgyz language. We are expanding beyond dictionary publishing into commercial Kyrgyz language corpora prepared as structured datasets for professional use.

Our first release is a Kyrgyz–Russian structured corpus based on the Yudakhin dictionary and transformed into a reviewed dataset with segmented bilingual units and metadata.

The authority of Yudakhin matters. The structured transformation, segmentation, metadata design, and delivery as a usable language data product are the work of El-Sozduk.

What the Corpus Includes

The corpus is being converted into structured bilingual segments such as:

Senses Usage examples Idioms Proverbs Compounds Stable expressions Homonym groups Sense-level distinctions

Each segment is linked to lexical metadata and prepared for downstream AI and language technology workflows.

Why It Matters

Kyrgyz is a low-resource language. High-quality structured datasets with linguistic depth remain limited.

This corpus is designed to support:

The value of the corpus is not limited to simple dictionary pairs. It includes context-rich material such as examples, idioms, proverbs, and multi-word units.

Current Status

Demo set
1,000 reviewed segments
Full corpus (projected)
80,000 – 100,000 segments
Workflow
Human-in-the-loop review
Delivery
Staged releases for early partners

Delivery Formats

Available or planned delivery formats include:

For early-stage collaboration, dataset delivery is available through controlled sample access and staged releases.

Commercial Access

We currently offer:

Commercial licensing terms depend on scope, format, delivery stage, and intended use.

Interested in Kyrgyz Language Data?

Request a demo or contact El-Sozduk to discuss evaluation access and early partnership options.

Or email directly: data@el-sozduk.kg