Publications Dataset
Officially published, legitimate, and copyrighted ebooks across major disciplines.
Web Novels Dataset
Hand-picked high-quality Chinese web novels; fully customizable for specific needs.
Parallel Corpus Dataset
Large-scale multilingual parallel sentence pairs and document-level parallel pairs.
Scripts Dataset
Chinese and English scripts across film/TV/stage/anime/comedy and more.
Art Auction Image–Text Dataset
Fine art & auction image-text pairs with rich metadata; 1080P images.
Educational Video–Text Dataset
Lecture videos with timestamped subtitles (MD LaTeX / SRT phonetic formulas).
Video Q&A Dataset
Each video includes 2 questions (MCQ + short answer) for reasoning and comprehension.
Text–Image Paired Dataset
Process-aligned image-text groups for instruction following and consistency tasks.
Front-End Coding Dataset
Runnable front-end projects: websites, dashboards, admin panels, games, and 3D scenes.
Professional Certification Exam Dataset
Exam questions across public security, civil service, medical, law, engineering, finance, and more.
K12 Exam Questions Dataset (With Images)
Multi-subject questions with structured fields for difficulty, analysis, subject, and more.
Programming Competition Problems Dataset
Algorithmic problems with multi-language solutions and optional images per item.
Multilingual Competition Questions Dataset
Real competition questions with structured fields for question, answer, and explanation.
Multilingual Handwritten OCR Dataset
OCR dataset covering 12 languages for text spotting and document/layout understanding.
English & LCTLs Academic Papers Dataset
Academic papers across disciplines and degrees with strong long-context coverage.
Technical Documentation Dataset
Latest technical documentation from installation to operation and maintenance workflows.
3-Model High-Difficulty Test Questions Dataset
Undergrad+ science questions validated via three models with rich annotation fields.
High-Difficulty Physics Test Questions Dataset (ChatGPT o3)
Undergrad+ physics; o3 attempts reasoning 10 times; average pass rate < 30%.
Original Informatics Competition Questions Dataset
NOI-benchmark difficulty with complete per-question package: code, tests, and solutions.
Structured High-Emotional-Intelligence Dataset
High-EQ framework with evaluation criteria and multi-turn dialogues across 500+ categories.
Publications Dataset
Officially published, legitimate, and copyrighted ebooks across major disciplines.
Web Novels Dataset
Hand-picked high-quality Chinese web novels; fully customizable for specific needs.
Parallel Corpus Dataset
Large-scale multilingual parallel sentence pairs and document-level parallel pairs.
Scripts Dataset
Chinese and English scripts across film/TV/stage/anime/comedy and more.
Art Auction Image–Text Dataset
Fine art & auction image-text pairs with rich metadata; 1080P images.
Educational Video–Text Dataset
Lecture videos with timestamped subtitles (MD LaTeX / SRT phonetic formulas).
Video Q&A Dataset
Each video includes 2 questions (MCQ + short answer) for reasoning and comprehension.
Text–Image Paired Dataset
Process-aligned image-text groups for instruction following and consistency tasks.
Front-End Coding Dataset
Runnable front-end projects: websites, dashboards, admin panels, games, and 3D scenes.
Professional Certification Exam Dataset
Exam questions across public security, civil service, medical, law, engineering, finance, and more.
K12 Exam Questions Dataset (With Images)
Multi-subject questions with structured fields for difficulty, analysis, subject, and more.
Programming Competition Problems Dataset
Algorithmic problems with multi-language solutions and optional images per item.
Multilingual Competition Questions Dataset
Real competition questions with structured fields for question, answer, and explanation.
Multilingual Handwritten OCR Dataset
OCR dataset covering 12 languages for text spotting and document/layout understanding.
English & LCTLs Academic Papers Dataset
Academic papers across disciplines and degrees with strong long-context coverage.
Technical Documentation Dataset
Latest technical documentation from installation to operation and maintenance workflows.
3-Model High-Difficulty Test Questions Dataset
Undergrad+ science questions validated via three models with rich annotation fields.
High-Difficulty Physics Test Questions Dataset (ChatGPT o3)
Undergrad+ physics; o3 attempts reasoning 10 times; average pass rate < 30%.
Original Informatics Competition Questions Dataset
NOI-benchmark difficulty with complete per-question package: code, tests, and solutions.
Structured High-Emotional-Intelligence Dataset
High-EQ framework with evaluation criteria and multi-turn dialogues across 500+ categories.