NAIRR Workshop · Jetstream2 Demo
What This Demo Is Doing — and Why
We take one AI model and run it three different ways to see which is fastest, lightest, and best — all on free cloud computing from NAIRR.
The big idea
An "AI model" is just a file full of math — the brain. But that brain can't do anything on its own; you need a piece of software to run it. There are several popular programs for this, and they're not all equally fast or efficient.
So we ask a simple, practical question: if we use the exact same AI brain, does it matter which program we run it with? We test three of the most popular ones and measure the difference.
What we're doing & why it matters
🔬 What we're doing
Running the same model (called Qwen3) through three programs — Ollama, llama.cpp, and vLLM — asking each the identical set of questions, and timing everything.
🎯 Why it matters
If you want to use AI in a classroom or research project, you have to pick a tool. This shows you, with real numbers, which one fits your needs — and proves you can do it on free NAIRR computing, no expensive hardware required.
The sequence — what happens, step by step
Pick one model & write the questions
We choose a single AI model and prepare a fixed list of questions, so every program gets the exact same test. Fair comparison starts here.
Load the model into the first program
We start Ollama, give it the model, and ask it our questions — recording how long each answer takes.
Repeat with the next program
We do the very same thing with llama.cpp, then vLLM. (vLLM needs a graphics card — on a basic instance it's skipped automatically.)
Measure everything
For each answer we track: how fast it types (words per second), how quickly it starts replying, how much memory it uses, and how good the answer is.
Put it all in one comparison table
Finally we line the three programs up side by side so the winner — and the trade‑offs — are obvious at a glance.
The questions we ask (and what each one reveals)
"Hi, who are you?" — measures the bare minimum response time. How snappy is it?
"Why is the sky blue?" — a everyday explanation, easy to judge for quality.
"Write a program to find prime numbers." — can it produce correct, working code?
"Write a 400-word essay on photosynthesis." — the best test of sustained speed.
A two-step discount math problem. — does it actually think, or just guess?
"List the planets as structured data." — can it follow precise instructions?
Read a passage, then answer about it. — how well does it handle longer input?
Every question at the same time. — which program handles a "crowd" (like a full classroom) best?
🎓 The one thing to take away
By the end, you'll have seen — with real numbers — that there's a simple rule behind all of it:
The model's size mostly changes the quality of the answers.
Knowing that, you can confidently pick the right setup for your own classroom or research — and you just did it all on free, shared national computing.