Skip to content

Quick Start#

Get started with La Perf in just a few minutes!


Prerequisites#

Before running La Perf, ensure you have:

  • uv package manager
  • Python 3.12+ - uv will automatically install it
  • Ollama - for LLM, VLM inference (Optional)
  • LM Studio - for LLM, VLM inference (Optional)

Why uv?

La Perf uses uv for fast, reliable dependency management. It's significantly faster than pip and handles environment isolation automatically.


Installation#

1. Clone the repository#

git clone https://github.com/bogdanminko/laperf.git
cd laperf

2. (Optional) Configure environment variables#

La Perf works out of the box with default settings, but you can customize it:

cp .env.example .env
# Edit .env to customize settings

Common customizations:

  • Change provider URLs - Use different OpenAI-compatible providers (vLLM, TGI, LocalAI)
  • Adjust dataset sizes - Change LLM_DATA_SIZE, VLM_DATA_SIZE, EMBEDDING_DATA_SIZE
  • Select backends - Use LM_STUDIO, OLLAMA, or BOTH for benchmarking
  • Customize models - Set different model names for your provider

Using a custom provider

To use vLLM or another OpenAI-compatible provider:

# In your .env file:
LLM_BACKEND=LM_STUDIO
LMS_LLM_BASE_URL=http://localhost:8000/v1
LMS_LLM_MODEL_NAME=Qwen/Qwen3-30B-Instruct
LLM_API_KEY=your-api-key-if-needed

3. Install dependencies (optional)#

uv sync

This will:

  • Create a virtual environment
  • Install all required dependencies
  • Set up the project for immediate use

Running Your First Benchmark#

Run all benchmarks#

Using make

make bench

Using uv

uv run python main.py

This will:

  1. Auto-detect your hardware (CUDA / MPS / CPU)
  2. Run all available benchmarks (all are pre-selected — you can toggle individual ones in the TUI using Space)
  3. Save the results to results/report_{your_device}.json

Hardware Detection

La Perf automatically detects your GPU and optimizes accordingly. No manual configuration needed!

Understanding Results#

After running benchmarks, you'll find:

  • JSON results in results/report_{device}.json
  • Plots in results/plots/
  • Summary tables in the terminal

Generate Markdown Tables#

Run

make
or

make generate

This processes JSON results and generates markdown tables for the README.


Next Steps#


Troubleshooting#

Out of memory#

If you encounter out-of-memory errors, create a .env file and adjust these settings:

cp .env.example .env

Then edit .env to reduce resource usage:

  • Reduce batch size: EMBEDDING_BATCH_SIZE=16 (default: 32)
  • Reduce dataset size: EMBEDDING_DATA_SIZE=1000 (default: 3000)
  • Reduce LLM/VLM samples: LLM_DATA_SIZE=5 or VLM_DATA_SIZE=5 (default: 10)
  • Close other GPU-intensive applications
  • Use CPU mode for testing (slower but works)

Get Help#

Need help? Check out: