Run Gemma on Google Colab Free tier

What is Gemma?

Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens

https://huggingface.co/blog/gemma

In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.

requirements

HuggingFace account
Google account

Step 1. Get access to Gemma

We can use Gemma with Transformers 4.38 but to do that first we need to get a grant to access the model.

https://huggingface.co/google/gemma-7b

Once you get a grant, you will see the below in the above page.

Step 2. Add HF_TOKEN to Google Colab

We need to add HF_TOKEN to Google Colab to access gemma via Transformers.

First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens

Then click the key icon in the sidebar on Google Colab like below.

Step 3. Install packages


<span>!</span>pip <span>install</span> <span>-U</span> <span>"transformers==4.38.1"</span> <span>--upgrade</span>
<span>!</span>pip <span>install </span>accelerate
<span>!</span>pip <span>install</span> <span>-i</span> https://pypi.org/simple/ bitsandbytes
<span>!</span>pip <span>install</span> <span>-U</span> <span>"transformers==4.38.1"</span> <span>--upgrade</span>
<span>!</span>pip <span>install </span>accelerate
<span>!</span>pip <span>install</span> <span>-i</span> https://pypi.org/simple/ bitsandbytes
!pip install -U "transformers==4.38.1" --upgrade
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes

Enter fullscreen mode Exit fullscreen mode

Step 4. Write Python code to run Gemma

We can use gemma-7b model via transformers.


from transformers import AutoTokenizer, pipeline
import torch
model <span>=</span> <span>"google/gemma-7b-it"</span>
<span># use quantized model</span>
pipeline <span>=</span> pipeline<span>(</span>
    <span>"text-generation"</span>,
    <span>model</span><span>=</span>model,
    <span>model_kwargs</span><span>={</span>
        <span>"torch_dtype"</span>: torch.float16,
        <span>"quantization_config"</span>: <span>{</span><span>"load_in_4bit"</span>: True<span>}</span>
    <span>}</span>,
<span>)</span>
messages <span>=</span> <span>[</span>
    <span>{</span><span>"role"</span>: <span>"user"</span>, <span>"content"</span>: <span>"Tell me about ChatGPT"</span><span>}</span>,
<span>]</span>
prompt <span>=</span> pipeline.tokenizer.apply_chat_template<span>(</span>messages, <span>tokenize</span><span>=</span>False, <span>add_generation_prompt</span><span>=</span>True<span>)</span>
outputs <span>=</span> pipeline<span>(</span>
    prompt,
    <span>max_new_tokens</span><span>=</span>256,
    <span>do_sample</span><span>=</span>True,
    <span>temperature</span><span>=</span>0.7,
    <span>top_k</span><span>=</span>50,
    <span>top_p</span><span>=</span>0.95
<span>)</span>
print<span>(</span>outputs[0][<span>"generated_text"</span><span>][</span>len<span>(</span>prompt<span>)</span>:]<span>)</span>
from transformers import AutoTokenizer, pipeline
import torch

model <span>=</span> <span>"google/gemma-7b-it"</span>
<span># use quantized model</span>
pipeline <span>=</span> pipeline<span>(</span>
    <span>"text-generation"</span>,
    <span>model</span><span>=</span>model,
    <span>model_kwargs</span><span>={</span>
        <span>"torch_dtype"</span>: torch.float16,
        <span>"quantization_config"</span>: <span>{</span><span>"load_in_4bit"</span>: True<span>}</span>
    <span>}</span>,
<span>)</span>


messages <span>=</span> <span>[</span>
    <span>{</span><span>"role"</span>: <span>"user"</span>, <span>"content"</span>: <span>"Tell me about ChatGPT"</span><span>}</span>,
<span>]</span>
prompt <span>=</span> pipeline.tokenizer.apply_chat_template<span>(</span>messages, <span>tokenize</span><span>=</span>False, <span>add_generation_prompt</span><span>=</span>True<span>)</span>
outputs <span>=</span> pipeline<span>(</span>
    prompt,
    <span>max_new_tokens</span><span>=</span>256,
    <span>do_sample</span><span>=</span>True,
    <span>temperature</span><span>=</span>0.7,
    <span>top_k</span><span>=</span>50,
    <span>top_p</span><span>=</span>0.95
<span>)</span>
print<span>(</span>outputs[0][<span>"generated_text"</span><span>][</span>len<span>(</span>prompt<span>)</span>:]<span>)</span>
from transformers import AutoTokenizer, pipeline
import torch

model = "google/gemma-7b-it"
# use quantized model
pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={
        "torch_dtype": torch.float16,
        "quantization_config": {"load_in_4bit": True}
    },
)


messages = [
    {"role": "user", "content": "Tell me about ChatGPT"},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])

Enter fullscreen mode Exit fullscreen mode

Result

The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲

ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:

Key Features:

Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
Conversation: It can engage in natural language conversation, answer questions, and provide information.
Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.

Additional Information:

Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development

原文链接：Run Gemma on Google Colab Free tier

展开阅读全文

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END