No description
- Python 82.6%
- PowerShell 16.6%
- Shell 0.8%
| docs | ||
| PLANS | ||
| test | ||
| utils | ||
| .gitignore | ||
| aesthetic_scorer.py | ||
| aesthetic_scorer.py.bak | ||
| aesthetic_scorer.py.main | ||
| aesthetic_scorer.py.siglip | ||
| CHANGELOG.md | ||
| rank.ps1 | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
| TESTING.md | ||
| train.ps1 | ||
| VERSION | ||
Aesthetic Scorer
Personal aesthetic image scorer. Load ~1000 curated favorite images, extract features with SigLIP2 or DINOv3, compute a centroid, score any new image in milliseconds.
Supports two backends: SigLIP2 (taste + text scoring) and DINOv3 (taste-only, state-of-the-art features).
Setup
# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers safetensors pillow pillow-heif open-clip-torch
# SigLIP2 downloads on first run (~2GB, cached locally)
# DINOv3 downloads on first run (~14GB for ViT-7B)
Requirements:
- Python 3.11+
- CUDA 12
- PyTorch 2.1+
- 24GB VRAM (recommended; DINOv3 ViT-7B fits with int4 quantization)
Commands
extract — Build your centroid
# SigLIP2 (default)
python aesthetic_scorer.py extract "~/Photos/Favorites" --name "my-taste"
# DINOv3 (ViT-7B/16, default biggest model)
python aesthetic_scorer.py extract "~/Photos/Favorites" --model dinov3 --name "my-taste"
# DINOv3 with a smaller variant
python aesthetic_scorer.py extract "~/Photos/Favorites" --model dinov3 \
--model-id facebook/dinov3-vitb14-pretrain-lvd1689m --name "my-taste"
score — Score a single image
# Uses newest centroid automatiacally (or specify with --centroid)
python aesthetic_scorer.py score "~/Photos/image.jpg" # → 0.7324
# Score with a named centroid
python aesthetic_scorer.py score "~/Photos/image.jpg" --name "my-taste"
# Score with DINOv3 centroid
python aesthetic_scorer.py score "~/Photos/image.jpg" --model dinov3 --name "my-taste"
rank — Rank a folder of images
# Taste scoring with SigLIP2 centroid (default backend)
python aesthetic_scorer.py rank "~/Photos/ToScore" --centroid models/siglip2_2026-05-24.pt --top 20
# Taste scoring with DINOv3 centroid
python aesthetic_scorer.py rank "~/Photos/ToScore" --model dinov3 --centroid models/dinov3_2026-05-24.pt --top 20
# Rank with a named centroid (auto-discovers model type)
python aesthetic_scorer.py rank "~/Photos/ToScore" --name "my-taste" --top 20
# Rank using text prompts instead of a centroid (SigLIP2 only)
python aesthetic_scorer.py rank "~/Photos/ToScore" \
--positives "portrait|golden hour" \
--negatives "blurry|low quality" \
--top 20 --bottom 5
# Combined taste + text scoring (SigLIP2 only)
python aesthetic_scorer.py rank "~/Photos/ToScore" \
--centroid models/siglip2_taste.pt \
--positives "golden hour|sunset" --top 20
# Export ranked results to CSV
python aesthetic_scorer.py rank "~/Photos/ToScore" --csv results.csv --top 100
# Rename files with score prefix (img.jpg → [0.7324]_img.jpg)
python aesthetic_scorer.py rank "~/Photos/ToScore" --rename
# Siphon images above a score threshold to a target folder (dry-run first)
python aesthetic_scorer.py rank "~/Photos/ToScore" --siphon 0.85 --siphon-dest ./curated --dry-run
python aesthetic_scorer.py rank "~/Photos/ToScore" --siphon 0.85 --siphon-dest ./curated --move --flatten
list — List available centroids
python aesthetic_scorer.py list
test — Self-test
python aesthetic_scorer.py test # no GPU needed
Backend Selection
| Flag | Backend | Text Scoring | Feature Dim | Default Model |
|---|---|---|---|---|
--model siglip2 |
SigLIP2 (open_clip) | ✅ | 1536 | ViT-gopt-16-SigLIP2-384 |
--model dinov3 |
DINOv3 (transformers) | ❌ | 4096 | ViT-7B/16 |
Text prompts (--positives/--negatives) only work with SigLIP2. Using them with DINOv3 raises a clear error.
DINOv3 variant can be selected with --model-id:
facebook/dinov3-vit7b16-pretrain-lvd1689m(default, 7B params)facebook/dinov3-vitl16-pretrain-lvd1689m(304M params)facebook/dinov3-vitb14-pretrain-lvd1689m(86M params)facebook/dinov3-vits16-pretrain-lvd1689m(22M params)
Model Artifacts
All model files are stored in models/ (gitignored):
| File | Description |
|---|---|
models/siglip2_YYYY-MM-DD_name.pt |
SigLIP2 centroid + metadata |
models/dinov3_YYYY-MM-DD_name.pt |
DINOv3 centroid + metadata |
Centroids include model_type in metadata. Loading a SigLIP2 centroid with DINOv3 (or vice versa) raises a clear error.
Architecture
- SigLIP2: ViT-gopt-16-SigLIP2-384 (
open_clip_torch, 1.87B params, 1536-dim features). Resolution: 384×384. - DINOv3: ViT-7B/16 via HuggingFace
transformers(7B params, 4096-dim features). Resolution: 518×518. - Scoring: Features → L2-normalize → cosine to centroid →
[0, 1] - Text scoring (SigLIP2 only): Text-image cosine similarity, both percentile-ranked before multiply when combined
- Centroid: Streaming mean, O(1) memory — processes arbitrary dataset sizes
- Centroid validation:
model_typeandfeature_dimstored in metadata, validated on load
Output
All scores in [0, 1] where 1.0 = perfectly typical of your favorites. Higher = more like your training set.