Now accepting early access requests

Structured Kyrgyz language data for AI, search, machine translation, lexicography, and research

The El-Sozduk Kyrgyz–Russian Lexical Corpus, based on the Yudakhin Kyrgyz–Russian Dictionary — segmented, normalized, and enriched with linguistic metadata.

~85,000
Structured bilingual segments
40K+
Dictionary entries processed
8
Segment types
JSONL
Structured delivery
Built by El-Sozduk·The largest online Kyrgyz dictionary platform·Licensing and custom dataset development available

Access Options

Choose the level of access that fits your evaluation or project needs.

Free Sample

A small introductory sample for a first look at the corpus structure.

  • Mini segment file (JSONL)
  • Schema preview
  • For initial familiarization
  • Available on request

Commercial License

Licensed access to the full corpus — approximately 85K structured segments, in Standard or Premium plan.

  • ~85,000 structured segments
  • Standard or Premium plan
  • Licensed for commercial use
  • Versioned releases
  • Dedicated support

Custom Corpus Development

Tailored datasets built to buyer specifications.

  • Custom scope, structure, and formats
  • Additional source integration by agreement
  • Priority delivery timeline
  • Direct collaboration with the El-Sozduk team

Standard or Premium

We offer two plans for the corpus, suited to different review and licensing needs.

Standard

Core schema with ready-to-use buyer-facing fields for practical review.

  • 15+ structured fields (core schema)
  • Ready-to-use buyer-facing fields
  • Simplified schema for practical evaluation and ingestion
  • Suitable for first technical and commercial review

El-Sozduk Kyrgyz–Russian Lexical Corpus

Based on the Yudakhin Kyrgyz–Russian Dictionary.

Why El-Sozduk

El-Sozduk is a long-running Kyrgyz digital language initiative focused on practical language resources, structured lexical data, and language technology.

The dataset initiative is led by Chorobek Saadanbekov, founder of the Kyrgyz Translate Community that led the effort to bring Kyrgyz into Google Translate, and a long-term builder of Kyrgyz digital language infrastructure through El-Sozduk, Kyrgyz Wikipedia, and related language technology projects.

We work with Kyrgyz lexical materials as a domain-specific team with deep familiarity with the language, its structure, and its digital use cases — not as a generic data vendor.

What’s in the Corpus

The El-Sozduk Kyrgyz–Russian Lexical Corpus is segmented into distinct bilingual units with linguistic metadata and reviewed through a structured pipeline.

Production-Grade Pipeline

Structured segmentation and review pipeline with versioned releases and full metadata coverage.

40K+
Dictionary entries processed
~85,000
Structured bilingual segments
Full
Metadata coverage
HITL
Human-in-the-loop review

Who Uses Structured Kyrgyz Data?

🤖

AI & LLM Training

Fine-tuning, evaluation sets, and grounding data for models covering Kyrgyz.

🔍

Multilingual Search

Kyrgyz–Russian bilingual indexing and retrieval for search engines and RAG systems.

🌐

Machine Translation

Parallel corpus data for low-resource MT pipelines involving Kyrgyz.

📚

Lexicography & Terminology

Structured dictionary data for terminology databases and lexicographic research.

🎓

NLP Research

Annotated Turkic language data for academic NLP and computational linguistics.

📊

Language Analytics

Structured data for usage pattern analysis, frequency studies, and corpus linguistics.

Delivery Formats

Data is delivered in standard formats ready for integration into your pipeline.

Licensing

Corpus access and usage are provided under defined El-Sozduk terms. Broader commercial rights are discussed separately during qualified inquiries.

Request access or discuss a custom corpus

Tell us about your organization and how you plan to use Kyrgyz language data. We review every request and respond within 1–2 business days.

Fill in the form to request access.
  • 📧
    Quick Response We reply within 1–2 business days
  • 📦
    Free Sample Ready Introductory data available for immediate review
  • 🤝
    Flexible Terms Licensing adapted to your use case and scale

Business Inquiry

All fields marked with * are required.

Your information is handled confidentially and used only to process your inquiry.

Thank you. Your request has been received.

We will review your inquiry and respond within 1–2 business days using the contact details you provided.

📧 Email: elsozduk.kg@gmail.com
👤 Contact: Chorobek Saadanbekov