What is Gemma?
Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens
https://huggingface.co/blog/gemma
In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.
requirements
- HuggingFace account
- Google account
Step 1. Get access to Gemma
We can use Gemma with Transformers
4.38 but to do that first we need to get a grant to access the model.
https://huggingface.co/google/gemma-7b
Once you get a grant, you will see the below in the above page.
Step 2. Add HF_TOKEN to Google Colab
We need to add HF_TOKEN
to Google Colab to access gemma via Transformers.
First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens
Then click the key icon in the sidebar on Google Colab like below.
Step 3. Install packages
<span>!</span>pip <span>install</span> <span>-U</span> <span>"transformers==4.38.1"</span> <span>--upgrade</span><span>!</span>pip <span>install </span>accelerate<span>!</span>pip <span>install</span> <span>-i</span> https://pypi.org/simple/ bitsandbytes<span>!</span>pip <span>install</span> <span>-U</span> <span>"transformers==4.38.1"</span> <span>--upgrade</span> <span>!</span>pip <span>install </span>accelerate <span>!</span>pip <span>install</span> <span>-i</span> https://pypi.org/simple/ bitsandbytes!pip install -U "transformers==4.38.1" --upgrade !pip install accelerate !pip install -i https://pypi.org/simple/ bitsandbytes
Enter fullscreen mode Exit fullscreen mode
Step 4. Write Python code to run Gemma
We can use gemma-7b
model via transformers.
from transformers import AutoTokenizer, pipelineimport torchmodel <span>=</span> <span>"google/gemma-7b-it"</span><span># use quantized model</span>pipeline <span>=</span> pipeline<span>(</span><span>"text-generation"</span>,<span>model</span><span>=</span>model,<span>model_kwargs</span><span>={</span><span>"torch_dtype"</span>: torch.float16,<span>"quantization_config"</span>: <span>{</span><span>"load_in_4bit"</span>: True<span>}</span><span>}</span>,<span>)</span>messages <span>=</span> <span>[</span><span>{</span><span>"role"</span>: <span>"user"</span>, <span>"content"</span>: <span>"Tell me about ChatGPT"</span><span>}</span>,<span>]</span>prompt <span>=</span> pipeline.tokenizer.apply_chat_template<span>(</span>messages, <span>tokenize</span><span>=</span>False, <span>add_generation_prompt</span><span>=</span>True<span>)</span>outputs <span>=</span> pipeline<span>(</span>prompt,<span>max_new_tokens</span><span>=</span>256,<span>do_sample</span><span>=</span>True,<span>temperature</span><span>=</span>0.7,<span>top_k</span><span>=</span>50,<span>top_p</span><span>=</span>0.95<span>)</span>print<span>(</span>outputs[0][<span>"generated_text"</span><span>][</span>len<span>(</span>prompt<span>)</span>:]<span>)</span>from transformers import AutoTokenizer, pipeline import torch model <span>=</span> <span>"google/gemma-7b-it"</span> <span># use quantized model</span> pipeline <span>=</span> pipeline<span>(</span> <span>"text-generation"</span>, <span>model</span><span>=</span>model, <span>model_kwargs</span><span>={</span> <span>"torch_dtype"</span>: torch.float16, <span>"quantization_config"</span>: <span>{</span><span>"load_in_4bit"</span>: True<span>}</span> <span>}</span>, <span>)</span> messages <span>=</span> <span>[</span> <span>{</span><span>"role"</span>: <span>"user"</span>, <span>"content"</span>: <span>"Tell me about ChatGPT"</span><span>}</span>, <span>]</span> prompt <span>=</span> pipeline.tokenizer.apply_chat_template<span>(</span>messages, <span>tokenize</span><span>=</span>False, <span>add_generation_prompt</span><span>=</span>True<span>)</span> outputs <span>=</span> pipeline<span>(</span> prompt, <span>max_new_tokens</span><span>=</span>256, <span>do_sample</span><span>=</span>True, <span>temperature</span><span>=</span>0.7, <span>top_k</span><span>=</span>50, <span>top_p</span><span>=</span>0.95 <span>)</span> print<span>(</span>outputs[0][<span>"generated_text"</span><span>][</span>len<span>(</span>prompt<span>)</span>:]<span>)</span>from transformers import AutoTokenizer, pipeline import torch model = "google/gemma-7b-it" # use quantized model pipeline = pipeline( "text-generation", model=model, model_kwargs={ "torch_dtype": torch.float16, "quantization_config": {"load_in_4bit": True} }, ) messages = [ {"role": "user", "content": "Tell me about ChatGPT"}, ] prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline( prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 ) print(outputs[0]["generated_text"][len(prompt):])
Enter fullscreen mode Exit fullscreen mode
Result
The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲
ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:
Key Features:
- Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
- Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
- Conversation: It can engage in natural language conversation, answer questions, and provide information.
- Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
- Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.
Additional Information:
- Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
- Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
- Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development
暂无评论内容