Benchmark inference speed

Times promptStreaming() over a few fixed prompts. Decode is output-token throughput from the first chunk to the last; prefill is input-token throughput up to the first streamed chunk. Token counts come from session.measureContextUsage(), which uses the model's real tokenizer.

idle