No description
  • Python 82.6%
  • PowerShell 16.6%
  • Shell 0.8%
Find a file
2026-05-25 16:36:20 -04:00
docs feat: full implementation — unified rank, rich naming, siphon 2026-05-20 22:37:01 -04:00
PLANS docs: default DINOv3 to ViT-7B (biggest model) 2026-05-24 21:37:36 -04:00
test fix: FEATURE_DIM=1024→1536, open_clip ViT-gopt-16-SigLIP2-384 outputs 1536-dim 2026-05-19 14:46:17 -04:00
utils video_frames: default worker count to half of CPU cores 2026-05-24 00:11:54 -04:00
.gitignore Restructure model artifacts into models/ directory and update README 2026-05-10 12:59:31 -04:00
aesthetic_scorer.py fix: use shutil.move instead of rename for cross-drive compatibility on Windows 2026-05-25 16:36:20 -04:00
aesthetic_scorer.py.bak siphon: change from percent to score threshold (e.g. --siphon 0.85) 2026-05-21 20:11:05 -04:00
aesthetic_scorer.py.main feat: full implementation — unified rank, rich naming, siphon 2026-05-20 22:37:01 -04:00
aesthetic_scorer.py.siglip feat: full implementation — unified rank, rich naming, siphon 2026-05-20 22:37:01 -04:00
CHANGELOG.md chore: version bump 0.0.0.3 → 0.0.1.0 2026-05-19 15:23:19 -04:00
rank.ps1 Merge remote-tracking branch 'origin/main' into unify-rank-siphon 2026-05-20 23:59:05 -04:00
README.md fix: encode_image shape consistency + update README for DINOv3 2026-05-24 21:48:37 -04:00
requirements.txt Fix percentile_rank O(n²), add OOM handlers, fix rename race 2026-05-17 19:52:03 -04:00
setup.sh fix: add venv availability check with correct apt package name 2026-05-17 12:38:53 -04:00
TESTING.md Initial implementation: aesthetic image scorer 2026-05-09 17:41:00 -04:00
train.ps1 fafds 2026-05-15 15:51:21 -04:00
VERSION chore: version bump 0.0.0.3 → 0.0.1.0 2026-05-19 15:23:19 -04:00

Aesthetic Scorer

Personal aesthetic image scorer. Load ~1000 curated favorite images, extract features with SigLIP2 or DINOv3, compute a centroid, score any new image in milliseconds.

Supports two backends: SigLIP2 (taste + text scoring) and DINOv3 (taste-only, state-of-the-art features).

Setup

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers safetensors pillow pillow-heif open-clip-torch

# SigLIP2 downloads on first run (~2GB, cached locally)
# DINOv3 downloads on first run (~14GB for ViT-7B)

Requirements:

  • Python 3.11+
  • CUDA 12
  • PyTorch 2.1+
  • 24GB VRAM (recommended; DINOv3 ViT-7B fits with int4 quantization)

Commands

extract — Build your centroid

# SigLIP2 (default)
python aesthetic_scorer.py extract "~/Photos/Favorites" --name "my-taste"

# DINOv3 (ViT-7B/16, default biggest model)
python aesthetic_scorer.py extract "~/Photos/Favorites" --model dinov3 --name "my-taste"

# DINOv3 with a smaller variant
python aesthetic_scorer.py extract "~/Photos/Favorites" --model dinov3 \
  --model-id facebook/dinov3-vitb14-pretrain-lvd1689m --name "my-taste"

score — Score a single image

# Uses newest centroid automatiacally (or specify with --centroid)
python aesthetic_scorer.py score "~/Photos/image.jpg"          # → 0.7324

# Score with a named centroid
python aesthetic_scorer.py score "~/Photos/image.jpg" --name "my-taste"

# Score with DINOv3 centroid
python aesthetic_scorer.py score "~/Photos/image.jpg" --model dinov3 --name "my-taste"

rank — Rank a folder of images

# Taste scoring with SigLIP2 centroid (default backend)
python aesthetic_scorer.py rank "~/Photos/ToScore" --centroid models/siglip2_2026-05-24.pt --top 20

# Taste scoring with DINOv3 centroid
python aesthetic_scorer.py rank "~/Photos/ToScore" --model dinov3 --centroid models/dinov3_2026-05-24.pt --top 20

# Rank with a named centroid (auto-discovers model type)
python aesthetic_scorer.py rank "~/Photos/ToScore" --name "my-taste" --top 20

# Rank using text prompts instead of a centroid (SigLIP2 only)
python aesthetic_scorer.py rank "~/Photos/ToScore" \
  --positives "portrait|golden hour" \
  --negatives "blurry|low quality" \
  --top 20 --bottom 5

# Combined taste + text scoring (SigLIP2 only)
python aesthetic_scorer.py rank "~/Photos/ToScore" \
  --centroid models/siglip2_taste.pt \
  --positives "golden hour|sunset" --top 20

# Export ranked results to CSV
python aesthetic_scorer.py rank "~/Photos/ToScore" --csv results.csv --top 100

# Rename files with score prefix (img.jpg → [0.7324]_img.jpg)
python aesthetic_scorer.py rank "~/Photos/ToScore" --rename

# Siphon images above a score threshold to a target folder (dry-run first)
python aesthetic_scorer.py rank "~/Photos/ToScore" --siphon 0.85 --siphon-dest ./curated --dry-run
python aesthetic_scorer.py rank "~/Photos/ToScore" --siphon 0.85 --siphon-dest ./curated --move --flatten

list — List available centroids

python aesthetic_scorer.py list

test — Self-test

python aesthetic_scorer.py test    # no GPU needed

Backend Selection

Flag Backend Text Scoring Feature Dim Default Model
--model siglip2 SigLIP2 (open_clip) 1536 ViT-gopt-16-SigLIP2-384
--model dinov3 DINOv3 (transformers) 4096 ViT-7B/16

Text prompts (--positives/--negatives) only work with SigLIP2. Using them with DINOv3 raises a clear error.

DINOv3 variant can be selected with --model-id:

  • facebook/dinov3-vit7b16-pretrain-lvd1689m (default, 7B params)
  • facebook/dinov3-vitl16-pretrain-lvd1689m (304M params)
  • facebook/dinov3-vitb14-pretrain-lvd1689m (86M params)
  • facebook/dinov3-vits16-pretrain-lvd1689m (22M params)

Model Artifacts

All model files are stored in models/ (gitignored):

File Description
models/siglip2_YYYY-MM-DD_name.pt SigLIP2 centroid + metadata
models/dinov3_YYYY-MM-DD_name.pt DINOv3 centroid + metadata

Centroids include model_type in metadata. Loading a SigLIP2 centroid with DINOv3 (or vice versa) raises a clear error.

Architecture

  • SigLIP2: ViT-gopt-16-SigLIP2-384 (open_clip_torch, 1.87B params, 1536-dim features). Resolution: 384×384.
  • DINOv3: ViT-7B/16 via HuggingFace transformers (7B params, 4096-dim features). Resolution: 518×518.
  • Scoring: Features → L2-normalize → cosine to centroid → [0, 1]
  • Text scoring (SigLIP2 only): Text-image cosine similarity, both percentile-ranked before multiply when combined
  • Centroid: Streaming mean, O(1) memory — processes arbitrary dataset sizes
  • Centroid validation: model_type and feature_dim stored in metadata, validated on load

Output

All scores in [0, 1] where 1.0 = perfectly typical of your favorites. Higher = more like your training set.