No Cloud. No Internet. No Problem. Two Commands for Local LLM on Jetson Orin Nano

Hey guys, welcome back to the channel. Paul McWhorter here from TopTechBoy.com. Today, we aren’t just messing around with simple circuits or basic scripts—we are going to take that NVIDIA Jetson Orin Nano we rescued from the brink of destruction in the last video, and we are going to turn it into a completely sovereign, local thinking machine.

I don’t know about you, but I am tired of Big Tech telling me I need a credit card, a monthly subscription, and a constant high-speed internet connection just to make an AI model reply to a prompt. Today, we are going to do it completely naked. We are going to cut the cord, pull the ethernet, and run cutting-edge Large Language Models entirely on the local physical silicon of your Jetson Orin Nano.

And we are going to do it in exactly two commands. One to build the engine room, and one to fire up the mind.

Let’s get started.

The Hardware Architecture

Before we drop the code into the terminal, let’s understand exactly what we are building today. We are dealing with three core components working together in a unified system.

  • The Model (The Fuel): This is your raw neural network file (like Google Gemma or Meta Llama). It contains the weights, vocabulary, and potential intelligence. On its own, it’s just a massive, inert file sitting on your storage drive.

  • Ollama (The Engine Room): This is the heavy lifter. Ollama is a local execution framework that takes that raw model file and boots it directly into the Jetson’s unified RAM and CUDA cores. It handles the brutal mathematical calculations required to generate tokens.

  • The Terminal Chat (The Dashboard): This is your interface. It provides the clean command-line text box for you to type your prompts and prints the model’s responses back to you in real time.

The Two-Command Installation

Go ahead and fire up your Jetson Orin Nano, open a fresh terminal window, and get ready to type. Remember: copying and pasting makes you weak. Type these out like a real engineer so your hands learn the muscle memory.

Command 1: Install the Ollama Engine

This command fetches the official automated bootstrapper script from Ollama and executes it locally to configure the background system service on your host OS.

Command 2: Fire Up the Local Model

Once the installation script finishes, your engine room is live. Now, tell Ollama to pull down the optimized 1-billion parameter Google Gemma model and launch an interactive local dialog loop instantly:

The moment you hit enter, your Jetson will download the model weights directly to your local drive, load them straight into the VRAM, and drop you into a clean prompt box. Type a question, hit enter, and watch your local silicon generate answers with zero cloud dependencies.

Choosing the Right Mind for Your Machine

The beautiful part about setting up Ollama is that you aren’t locked into just one model. Different models have different parameter sizes and strengths. On the 8GB Jetson Orin Nano, you want to balance model size against your available hardware headroom to keep your generation speeds crisp.

Here are the verified, hardware-accelerated local models you can experiment with right out of the box:

Launch Command Model Family Size / Parameter Count Best Used For
ollama run gemma3:1b Google Gemma 3 1 Billion Ultra-fast responses, light footprint
ollama run llama3.2:1b Meta Llama 3.2 1 Billion High-efficiency conversational loops
ollama run phi4-mini:3.8b Microsoft Phi-4 3.8 Billion Heavy reasoning and coding logic
ollama run qwen3:4b Alibaba Qwen 3 4 Billion Structured data and multilingual logic
ollama run qwen3.5:4b Alibaba Qwen 3.5 4 Billion Advanced context processing
ollama run gemma3:4b Google Gemma 3 4 Billion Maximum analytical depth on Orin Nano

⚠️ Paul’s Engineering Note on Headroom

The 1B (1-Billion parameter) models are incredibly light and will run at lightning speed on the Orin Nano. If you want to push the machine harder for more complex reasoning, step up to the 3.8B or 4B models. Just keep an eye on your system resources—running a 4B model pushes close to the limits of the Orin Nano’s 8GB unified memory architecture, especially if you are running a heavy graphical desktop environment in the background!

To exit out of any active terminal chat session and return to your standard command prompt, simply type:

Homework Assignment

Alright, you have the hardware running, you have the engine installed, and you know how to switch out the minds of your machine. Now it’s time for your homework.

I want you to install both the gemma3:1b model and the heavier gemma3:4b model on your Jetson Orin Nano. Run them both through a test sequence: ask them to write a simple Python script, and then ask them a complex logic riddle.

I want you to observe the difference in quality of thought versus speed of generation. Is the 4-billion parameter model smart enough to justify the extra computation time on your hardware, or does the 1-billion parameter model give you the snappy responsiveness you need for a real-time edge application?

Leave a comment down under the video showing your results, tell me which model you prefer running natively on your bench, and I will see you guys in the next lesson!

AI on the Edge LESSON 21: Managing Multiple Windows in OpenCV on the Raspberry Pi

Hey everyone, Paul McWhorter here!

Welcome back to the AI on the Edge series!

In today’s lesson, we’re going to take an important next step in computer vision. We’re going to learn how to create, position, resize, and manage multiple windows at the same time using OpenCV on the Raspberry Pi.

This might sound simple, but it’s actually a very big deal. Once you can comfortably work with multiple windows, you can start building much more powerful vision applications — like having a main camera view, a processed view, zoomed-in sections, and debug windows all running at once.

In this lesson we create:

  • One large main camera window
  • A smaller color preview
  • A small grayscale version
  • Five tiny grayscale windows stacked on the side

This gives you a clean, organized workspace while the camera is running.


What You Learned in This Lesson

  • How to create multiple named windows with cv2.namedWindow()
  • How to resize windows using cv2.resizeWindow()
  • How to precisely position windows on your screen with cv2.moveWindow()
  • How to work with different resolutions of the same image (full size, half size, quarter size)
  • Converting between color and grayscale while running live video
  • Keeping everything running smoothly with good FPS

Mastering multiple windows is one of those foundational skills that separates basic OpenCV projects from more professional and useful vision systems.


Pro Tip: Play around with the window positions and sizes after you get it working. Try making one window much larger, or experiment with different layouts. This is your workspace — make it comfortable!


Ready for the next step? In the next lesson, we’re going to start doing something really cool — we’ll begin combining live video with drawn graphics and start creating interactive vision projects.

Keep building, keep learning, and I’ll see you in the next video!

In the lesson, we develop the code below:

 

AI on the Edge LESSON 20: Resizing, Moving, Converting and Tiling Video frames in OpenCV

Welcome back to the AI on the Edge class series! In this lesson, we are diving deep into some of the most critical foundational skills you need when working with video streams on edge devices: Resizing, Moving, Converting, and Tiling video frames using OpenCV.

When you are developing real-world AI applications on the edge, you rarely just display a single camera feed. You often need to manipulate frames to feed them into your AI models, look at grayscale versions for edge detection, or arrange multiple windows on your desktop neatly so you can monitor your data visually.

If you want to follow along exactly as we do in the video, make sure you have your Raspberry Pi 5 set up with your Camera Module.

What We Cover in This Lesson

  • Fixed FPS Estimation: We continue using our robust low-pass filter formula to track smooth, non-jittery frames-per-second data directly on the video frame.

  • Creating Named Windows: Understanding how cv2.namedWindow() combined with cv2.WINDOW_GUI_NORMAL gives you absolute programmatic control over the placement of your displays.

  • Resizing & Moving Windows: How to accurately position multiple OpenCV windows on your screen using specific coordinates while accounting for operating system taskbars and window decorative margins.

  • Frame Manipulation: Using cv2.resize() to scale down video frames and cv2.cvtColor() to transform the color space from BGR to grayscale.

  • Window Tiling: Arranging a main camera view, a scaled-down color view, and a scaled-down grayscale view in a perfect grid layout on your desktop.

The Complete Lesson 20 Code

Below is the complete Python code we developed during this lesson. It sets up your hardware camera stream, calculates running performance metrics, processes three distinct variations of the video feed, and tiles them cleanly on your screen.

 

AI on the Edge LESSON 19: Create a Bouncing Box in OpenCV On Raspberry Pi

Hey everyone, Paul McWhorter here!

Welcome back to the AI on the Edge series! In today’s lesson, we’re going to have some fun and take our first real steps into computer vision animation.

We’re going to create a colorful box that bounces around the screen like an old-school screensaver, while displaying a live FPS counter so we can see how well our Raspberry Pi is handling the workload.

Even though it looks simple, this project teaches you several foundational skills you’ll use again and again in computer vision:

  • Working with coordinates and drawing shapes in OpenCV
  • Creating smooth real-time animation
  • Detecting boundaries and reversing direction
  • Calculating and displaying live FPS

These are the same techniques you’ll build on later when we start doing object tracking, collision detection, and more advanced AI vision projects.


What You Learned in This Lesson

  • How to draw filled rectangles on a live video stream
  • How to move objects smoothly frame by frame
  • How to make objects “bounce” realistically off screen edges
  • A clean method for calculating and displaying FPS
  • Using variables to easily control size, position, speed, and color

This bouncing box may look basic, but once you understand how to do this, you can create all kinds of animated graphics that interact with what the camera sees.


Pro Tip: After you get it working, play around with the speed, box size, and colors. Try making multiple bouncing boxes with different speeds and directions — it’s a great way to experiment!


Ready for more? In the next lesson, we’re going to kick things up a notch and start working with multiple objects and more complex interactions.

Keep building, keep learning, and I’ll see you in the next video!

Paul McWhorter

For your convenience, this is the code we developed in the video.

 

AI on the Edge LESSON 18: Display Frames Per Second (FPS) on openCV Video Window

In today’s lesson, we add a clean, real-time Frames Per Second (FPS) counter directly onto our live OpenCV video window. Displaying FPS on screen is an essential tool for anyone working with camera-based AI projects on the Raspberry Pi. It gives you immediate feedback on your actual processing performance, helps with optimization, and makes your projects look more professional and polished.

In this lesson, we configure the Picamera2 library to run at 1280×720 resolution with a target of 60 frames per second. We then implement a smoothed FPS calculation using a weighted rolling average, which prevents the displayed value from jumping around wildly. Finally, we overlay the FPS text in the lower-left corner of the video frame using OpenCV’s putText() function, with font size and thickness that scale appropriately with the resolution.

This technique forms an important foundation for future lessons, as we will continue adding more information and graphics directly onto the live video stream. Understanding how to efficiently display performance metrics is key to developing responsive and practical edge AI applications.

In this lesson, this is the code which we develop:

 

Making The World a Better Place One High Tech Project at a Time. Enjoy!