Submitting a Fine-Tuning Job: Organising the Workforce

Letters, Legacy, and Learning: Fine-Tuning LLMs Inspired by the Six Triple Eight (8 Part Series)

1 Six Triple Eight Redux: Fine-Tuning LLMs to Tackle Impossible Mail Mysteries of WWII
2 Exploratory Data Analysis: Digging Through the Backlog
4 more parts…
3 Counting Tokens: Sorting Through the Details
4 Data Splitting: Breaking Down the Problem
5 Understanding the OpenAI JSONL Format: Organising the Records
6 Uploading Files to OpenAI: Passing the Baton
7 Submitting a Fine-Tuning Job: Organising the Workforce
8 Inference with Fine-Tuned Models: Delivering the Message

The Six Triple Eight relied on discipline and coordination to execute their mission. We’ll mirror this by creating and submitting a fine-tuning job, allowing the LLM to learn from our curated dataset.

Fine-Tuning with OpenAI

When you create a fine-tuning job via client.fine_tuning.job.create(), you submit your configuration and dataset to OpenAI for training. Below are the key parameters and their purposes.


1. Parameters Overview

model

  • Description: The pre-trained GPT model you wish to fine-tune.
  • Examples: "gpt-3.5-turbo", "davinci", "gpt-4-mini" (hypothetical).

training_file

  • Description: The file ID of an uploaded JSONL file containing your training data.
  • Note: Obtain this ID by uploading your dataset with the Files API and storing the file_id.

hyperparameters

  • Description: A dictionary specifying the fine-tuning hyperparameters.
  • Key Fields:
    • batch_size: Number of examples per batch (auto by default).
    • learning_rate_multiplier: Scale factor for the learning rate (auto by default).
    • n_epochs: Number of epochs (passes through the entire dataset).

suffix

  • Description: A custom string (up to 18 characters) appended to the fine-tuned model name.

seed

  • Description: Integer for reproducibility.
  • Usage: Ensures the same randomization and consistent training results across runs.

validation_file

  • Description: The file ID of a JSONL file containing your validation set.
  • Optional: But recommended for tracking overfitting and ensuring a well-generalized model.

integrations

  • Description: A list of integrations (e.g., Weights & Biases) you want enabled for the job.
  • Fields: Typically includes type and integration-specific configurations.

<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>job</span><span>.</span><span>create</span><span>(</span>
<span>model</span><span>=</span><span>"</span><span>gpt-3.5-turbo</span><span>"</span><span>,</span>
<span>training_file</span><span>=</span><span>"</span><span>train_id</span><span>"</span><span>,</span>
<span>hyperparameters</span><span>=</span><span>{</span>
<span>"</span><span>n_epochs</span><span>"</span><span>:</span> <span>1</span>
<span>},</span>
<span>validation_file</span><span>=</span><span>"</span><span>val_id</span><span>"</span>
<span>)</span>
<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>job</span><span>.</span><span>create</span><span>(</span>
    <span>model</span><span>=</span><span>"</span><span>gpt-3.5-turbo</span><span>"</span><span>,</span>
    <span>training_file</span><span>=</span><span>"</span><span>train_id</span><span>"</span><span>,</span>
    <span>hyperparameters</span><span>=</span><span>{</span>
        <span>"</span><span>n_epochs</span><span>"</span><span>:</span> <span>1</span>
    <span>},</span>
    <span>validation_file</span><span>=</span><span>"</span><span>val_id</span><span>"</span>
<span>)</span>
client.fine_tuning.job.create( model="gpt-3.5-turbo", training_file="train_id", hyperparameters={ "n_epochs": 1 }, validation_file="val_id" )

Enter fullscreen mode Exit fullscreen mode

Managing Fine-Tuning Jobs
Retrieves up to 10 fine-tuning jobs.

<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>jobs</span><span>.</span><span>list</span><span>(</span><span>limit</span><span>=</span><span>10</span><span>)</span>
<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>jobs</span><span>.</span><span>list</span><span>(</span><span>limit</span><span>=</span><span>10</span><span>)</span>
client.fine_tuning.jobs.list(limit=10)

Enter fullscreen mode Exit fullscreen mode


Retrieve a Specific Job

<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>retrieve</span><span>(</span><span>"</span><span>job_id</span><span>"</span><span>)</span>
<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>retrieve</span><span>(</span><span>"</span><span>job_id</span><span>"</span><span>)</span>
client.fine_tuning.retrieve("job_id")

Enter fullscreen mode Exit fullscreen mode


List Events for a Job

<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>list_events</span><span>(</span>
<span>fine_tuning_job_id</span><span>=</span><span>"</span><span>xxxx</span><span>"</span><span>,</span>
<span>limit</span><span>=</span><span>5</span>
<span>)</span>
<span>client</span><span>.</span><span>fine_tuning</span><span>.</span><span>list_events</span><span>(</span>
    <span>fine_tuning_job_id</span><span>=</span><span>"</span><span>xxxx</span><span>"</span><span>,</span>
    <span>limit</span><span>=</span><span>5</span>
<span>)</span>
client.fine_tuning.list_events( fine_tuning_job_id="xxxx", limit=5 )

Enter fullscreen mode Exit fullscreen mode

Summary

  • Model Selection: Choose a suitable GPT model to fine-tune.

  • Data Preparation: Upload JSONL files and note their IDs.

  • Hyperparameters: Tune batch size, learning rate, and epochs for optimal performance.

  • Monitoring: Use validation files, job retrieval, and event logging to ensure your model trains effectively.

  • Reproducibility: Set a seed if consistent results are important for your workflow.

  • By following these steps, you’ll have a clear path to submitting and managing your fine-tuning jobs in OpenAI, ensuring your model is trained precisely on your custom data.

Letters, Legacy, and Learning: Fine-Tuning LLMs Inspired by the Six Triple Eight (8 Part Series)

1 Six Triple Eight Redux: Fine-Tuning LLMs to Tackle Impossible Mail Mysteries of WWII
2 Exploratory Data Analysis: Digging Through the Backlog
4 more parts…
3 Counting Tokens: Sorting Through the Details
4 Data Splitting: Breaking Down the Problem
5 Understanding the OpenAI JSONL Format: Organising the Records
6 Uploading Files to OpenAI: Passing the Baton
7 Submitting a Fine-Tuning Job: Organising the Workforce
8 Inference with Fine-Tuned Models: Delivering the Message

原文链接:Submitting a Fine-Tuning Job: Organising the Workforce

© 版权声明
THE END
喜欢就支持一下吧
点赞6 分享
No matter what happened in the past, you have to believe that the best is yet to come.
无论过去发生过什么,你都要相信,最好的尚未到来
评论 抢沙发

请登录后发表评论

    暂无评论内容