AI & ML interests

None defined yet.

Recent Activity

davanstrienย  updated a dataset 13 days ago
uv-scripts/dataset-stats
davanstrienย  published a dataset 14 days ago
uv-scripts/dataset-stats
davanstrienย  updated a dataset about 1 month ago
uv-scripts/jobs-utils
View all activity

UV Scripts

Ready-to-run ML tools powered by UV - zero setup, maximum power

Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with uv run.

What are UV scripts?

UV scripts are self-contained Python scripts that use inline metadata to specify dependencies. Just uv run script.py and everything installs automatically.

Perfect for:

  • ๐Ÿš€ GPU workflows on HF Jobs
  • ๐Ÿ’ป Local processing on your machine
  • ๐Ÿ”„ Reproducible pipelines that work anywhere

๐Ÿš€ Quick Example

# Extract text from images with state-of-the-art OCR (no local GPU needed!)
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
  your-images your-extracted-text

๐Ÿ“š Browse Scripts

Script Collection Description GPU Required
ocr Extract text from images with VLMs (LaTeX, tables, forms) โœ…
classification Text classification with guaranteed valid outputs โœ…
dataset-creation Create datasets from PDFs and files โŒ
vllm High-performance inference with vLLM โœ…
synthetic-data Generate high-quality synthetic data with CoT reasoning โœ…
deduplication Remove duplicates using semantic similarity โŒ
openai-oss Generate responses with visible reasoning traces โœ…

๐ŸŽฏ Why UV Scripts?

Zero Setup

No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script.

GPU Optimized

Seamlessly run on local GPUs or scale to cloud with HF Jobs. Same script, different compute.

๐ŸŒŸ Featured Scripts

OCR Any Document Dataset

Extract text from images with state-of-the-art accuracy:

# Handles LaTeX, tables, forms, handwriting
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
  your-images extracted-text

Deduplicate Datasets (CPU-Friendly!)

Remove duplicates using semantic similarity - no GPU needed:

# Fast semantic deduplication on CPU
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
  your-dataset text your-dataset-clean \
  --method duplicates --threshold 0.9

Generate Synthetic Training Data

Create high-quality synthetic data with chain-of-thought reasoning:

# Generate synthetic math problems with reasoning
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
  --seed-dataset math-examples --output-dataset synthetic-math \
  --task-type reasoning --num-samples 1000

๐Ÿš€ Getting Started with HF Jobs

Run any UV script on GPU infrastructure:

hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
  [args]

Choose your GPU flavor:

  • l4x1 - Good balance for most tasks
  • a10g-large - More memory for larger models
  • a100-large - Maximum performance

๐Ÿ“– Learn More


UV Scripts is a community project showcasing the power of UV for ML workflows.

models 0

None public yet