Run Gemma on Google Colab Free tier

What is Gemma?

Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens

https://huggingface.co/blog/gemma

In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.

requirements

  • HuggingFace account
  • Google account

Step 1. Get access to Gemma

We can use Gemma with Transformers 4.38 but to do that first we need to get a grant to access the model.

https://huggingface.co/google/gemma-7b

Once you get a grant, you will see the below in the above page.

Step 2. Add HF_TOKEN to Google Colab

We need to add HF_TOKEN to Google Colab to access gemma via Transformers.

First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens

Then click the key icon in the sidebar on Google Colab like below.

Step 3. Install packages

<span>!</span>pip <span>install</span> <span>-U</span> <span>"transformers==4.38.1"</span> <span>--upgrade</span>
<span>!</span>pip <span>install </span>accelerate
<span>!</span>pip <span>install</span> <span>-i</span> https://pypi.org/simple/ bitsandbytes
<span>!</span>pip <span>install</span> <span>-U</span> <span>"transformers==4.38.1"</span> <span>--upgrade</span>
<span>!</span>pip <span>install </span>accelerate
<span>!</span>pip <span>install</span> <span>-i</span> https://pypi.org/simple/ bitsandbytes
!pip install -U "transformers==4.38.1" --upgrade !pip install accelerate !pip install -i https://pypi.org/simple/ bitsandbytes

Enter fullscreen mode Exit fullscreen mode

Step 4. Write Python code to run Gemma

We can use gemma-7b model via transformers.

from transformers import AutoTokenizer, pipeline
import torch
model <span>=</span> <span>"google/gemma-7b-it"</span>
<span># use quantized model</span>
pipeline <span>=</span> pipeline<span>(</span>
<span>"text-generation"</span>,
<span>model</span><span>=</span>model,
<span>model_kwargs</span><span>={</span>
<span>"torch_dtype"</span>: torch.float16,
<span>"quantization_config"</span>: <span>{</span><span>"load_in_4bit"</span>: True<span>}</span>
<span>}</span>,
<span>)</span>
messages <span>=</span> <span>[</span>
<span>{</span><span>"role"</span>: <span>"user"</span>, <span>"content"</span>: <span>"Tell me about ChatGPT"</span><span>}</span>,
<span>]</span>
prompt <span>=</span> pipeline.tokenizer.apply_chat_template<span>(</span>messages, <span>tokenize</span><span>=</span>False, <span>add_generation_prompt</span><span>=</span>True<span>)</span>
outputs <span>=</span> pipeline<span>(</span>
prompt,
<span>max_new_tokens</span><span>=</span>256,
<span>do_sample</span><span>=</span>True,
<span>temperature</span><span>=</span>0.7,
<span>top_k</span><span>=</span>50,
<span>top_p</span><span>=</span>0.95
<span>)</span>
print<span>(</span>outputs[0][<span>"generated_text"</span><span>][</span>len<span>(</span>prompt<span>)</span>:]<span>)</span>
from transformers import AutoTokenizer, pipeline
import torch

model <span>=</span> <span>"google/gemma-7b-it"</span>
<span># use quantized model</span>
pipeline <span>=</span> pipeline<span>(</span>
    <span>"text-generation"</span>,
    <span>model</span><span>=</span>model,
    <span>model_kwargs</span><span>={</span>
        <span>"torch_dtype"</span>: torch.float16,
        <span>"quantization_config"</span>: <span>{</span><span>"load_in_4bit"</span>: True<span>}</span>
    <span>}</span>,
<span>)</span>


messages <span>=</span> <span>[</span>
    <span>{</span><span>"role"</span>: <span>"user"</span>, <span>"content"</span>: <span>"Tell me about ChatGPT"</span><span>}</span>,
<span>]</span>
prompt <span>=</span> pipeline.tokenizer.apply_chat_template<span>(</span>messages, <span>tokenize</span><span>=</span>False, <span>add_generation_prompt</span><span>=</span>True<span>)</span>
outputs <span>=</span> pipeline<span>(</span>
    prompt,
    <span>max_new_tokens</span><span>=</span>256,
    <span>do_sample</span><span>=</span>True,
    <span>temperature</span><span>=</span>0.7,
    <span>top_k</span><span>=</span>50,
    <span>top_p</span><span>=</span>0.95
<span>)</span>
print<span>(</span>outputs[0][<span>"generated_text"</span><span>][</span>len<span>(</span>prompt<span>)</span>:]<span>)</span>
from transformers import AutoTokenizer, pipeline import torch model = "google/gemma-7b-it" # use quantized model pipeline = pipeline( "text-generation", model=model, model_kwargs={ "torch_dtype": torch.float16, "quantization_config": {"load_in_4bit": True} }, ) messages = [ {"role": "user", "content": "Tell me about ChatGPT"}, ] prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline( prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 ) print(outputs[0]["generated_text"][len(prompt):])

Enter fullscreen mode Exit fullscreen mode

Result

The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲

ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:

Key Features:

  • Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
  • Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
  • Conversation: It can engage in natural language conversation, answer questions, and provide information.
  • Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
  • Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.

Additional Information:

  • Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
  • Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
  • Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development

原文链接:Run Gemma on Google Colab Free tier

© 版权声明
THE END
喜欢就支持一下吧
点赞6 分享
The best things in life are free.
生活中最美好的事都是免费的
评论 抢沙发

请登录后发表评论

    暂无评论内容