Cannot use llama.cpp for quantization

Hello, I’m not very good with technology, but I’ve been experimenting with LLMs recently. I was normally using Google Colab notebooks, but now that they are no longer sufficient, I want to switch to a service like Lighting Pro. However, I wanted to try one of my Python notebooks prepared for Colab on Lightning AI before switching, and I encountered the following problem. I would be very grateful if you could help.

The issue

I get the error “No such file or directory” when I try the quantize a model. Probably this is very easy for technologically literate people but not for me. I’ve been trying to fix it for nearly one hour (I’ve tried different shell commands) but I’m yet to be successful :frowning: .

I extracted the code from my Jupyter Notebook:

# Install llama.cpp
!git clone https://github.com/ggerganov/llama.cpp
!cd llama.cpp && git pull && make clean && LLAMA_CUBLAS=1 make
!pip install -r llama.cpp/requirements.txt

# Install Hugging Face Hub
!pip3 install huggingface-hub

# Install Hugging Face Transfer
!pip install hf-transfer

# Enable HF_TRANSFER
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# Set model id
MODEL_ID = "malhajar/Mixtral-8x7B-v0.1-turkish"

# Download model from Hugging Face
from huggingface_hub import snapshot_download
snapshot_download(repo_id=MODEL_ID, ignore_patterns=["*.bin", "*.h5"], local_dir='Mixtral-8x7B-v0.1-turkish')

# Extract the model name from the model id
MODEL_NAME = MODEL_ID.split('/')[-1]

# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}

# Choose the quantization methods
QUANTIZATION_METHODS = ["q2_k", "q3_k_m", "q4_k_m", "q5_k_m", "q6_k", "q8_0"]

# Quantize the model - Here is where the error occurs!!!
for method in QUANTIZATION_METHODS:
    qtype = f"{MODEL_NAME}/{MODEL_NAME.lower()}.{method.upper()}.gguf"
    !./quantize {fp16} {qtype} {method}