Skip to main content
LLMs

Step-by-step Guide to Fine-tune Qwen3

100% locally.

Avi Chawla
Avi Chawla
๐Ÿ‘‰

TODAY'S ISSUE

TODAYโ€™S DAILY DOSE OF DATA SCIENCE

โ€‹Step-by-step guide to fine-tune Qwen3โ€‹

Recently, Alibaba released Qwen 3, the latest generation of LLMs in the Qwen series with dense and mixture-of-experts (MoE) models.

Hereโ€™s a tutorial on how you can fine-tune it using Unsloth.

The code is available in this Studio: โ€‹Fine-tuning Qwen 3 locallyโ€‹. You can run it without any installations by reproducing our environment below:

The video below depicts inference using the HuggingFace transformers library on our fine-tuned model with and without thinking mode.

0:00
/0:36

Letโ€™s begin!


1) Load the model

We start by loading the Qwen 3 (14B variant) model and its tokenizer using Unsloth.

2) Define LoRA config

We'll use LoRA to avoid fine-tuning the entire model.

To do this, we use Unsloth's PEFT by specifying:

  • The model
  • LoRA low-rank (r)
  • Layers for fine-tuning, etc.

3-4) Load datasets

Next, we load a reasoning and non-reasoning dataset, over which we'll fine-tune our Qwen 3 model.

Check a sample from each of these datasets:

5-6) Prepare dataset

Before fine-tuning, we must prepare the dataset in a conversational format:

  • From the reasoning data, we select the solution and solution keys.
  • For non-reasoning data, we use a standardization method, which converts the data to the desired format.

Check the code below and a sample from each of these datasets:

Note: Use clear prompt-response pairs. Format the fine-tuning data so that each problem is presented in a consistent, model-friendly way. Typically, this means turning each math problem into an instruction or question prompt and providing a well-structured solution or answer as the completion. Consistency in formatting helps the model learn how to transition from question to answer.- Incorporate step-by-step solutions (Chain-of-Thought).- Mark the final answer clearly- Ensure compatibility with evaluation benchmarks

7) Define Trainer

Here, we create a Trainer object by specifying the training config, like learning rate, model, tokenizer, and more.

8) Train

With that done, we initiate training. The loss is decreasing with training, which means the model is being trained fine.

Check this code and output๐Ÿ‘‡

9) Inference

Below, we run the model via the HuggingFace transformers library in a thinking and non-thinking mode. Thinking requires us to set the enable_thinking parameter to True.

With that, we have fine-tuned Qwen 3 completely locally!

The code is available in this Studio: โ€‹Fine-tuning Qwen3 locallyโ€‹. You can run it without any installations by reproducing our environment below:

MCP

Build an MCP Server in 3 Steps

We found the easiest way to build an MCP server.

Just follow these 3 steps:

  • Use Gitingest to convert the entire FastMCP repo into LLM-ready text.
  • Download the text file.
  • Upload it to FactoryAI and specify the type of MCP server you want to build.

That's all! FactoryAI builds it for you.

We have attached a video walkthrough below:

0:00
/0:49

If you don't know about MCP servers, we covered them recently in the newsletter here:

PRIVACY-PRESERVING ML

โ€‹Train models on private data with federated learningโ€‹

Thereโ€™s so much data on your mobile phone right now โ€” images, text messages, etc.

And this is just about one userโ€”you.

But applications can have millions of users. The amount of data we can train ML models on is unfathomable.

The problem?

This data is private.

So consolidating this data into a single place to train a model.

The solution?

โ€‹Federated learningโ€‹ is a smart way to address this challenge.

The core idea is to ship models to devices, train the model on the device, and retrieve the updates:

But this isn't as simple as it sounds.

1) Since the model is trained on the client side, how to reduce its size?

2) How do we aggregate different models received from the client side?

3) [IMPORTANT] Privacy-sensitive datasets are always biased with personal likings and beliefs. For instance, in an image-related task:

  • Some devices may only have pet images.
  • Some devices may only have car images.
  • Some people may love to travel, and may primarily have travel-related images.
  • How to handle such skewness in data distribution?

โ€‹Learn how to implement federated learning systems (beginner-friendly) โ†’

Published on Jun 5, 2025