
Ok so this is the run down on how to install and run llama.cpp on Ubuntu 22.04 (This works for my officially unsupported RX 6750 XT GPU running on my AMD Ryzen 5 system)
First off you need to run the usual:
sudo apt-get update sudo apt-get upgrade
Then you need to install all the ROCm libraries etc that will be used by llama.cpp
Start with adding the official radeon source to apt-get described here:
Quick Start (Linux) Add the amdgpu module repository for SLES 15.4 sudotee/etc/zypp/repos.d/amdgpu.repo<<'EOF' [amdgpu] name=amdgpu…rocm.docs.amd.com
sudo mkdir --parents --mode=0755 /etc/apt/keyrings wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \ gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null # Kernel driver repository for jammy sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF' deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7.1/ubuntu jammy main EOF # ROCm repository for jammy sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF' deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main EOF # Prefer packages from the rocm repository over system packages echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
Ok all that mess just sets up the radeon repo’s for jammy jellyfish on yur system
sudo apt-get update
And update so your system knows where it all is.
Install amds purpose made driver for all you ROCm business:
sudo apt-get install amdgpu-dkms
Put the libraries on there too:
sudo apt-get install rocm-hip-libraries
Have a little rest and reboot the system
sudo reboot
Now install the remainder development stuff needed to compile llama.cpp:
sudo apt-get install rocm-dev sudo apt-get install rocm-hip-runtime-dev rocm-hip-sdk sudo apt-get install rocm-libs sudo apt-get install rocminfo
Check rocminfo and you should have an output similar to this:
ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 5 2600X Six-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 5 2600X Six-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3600 BDFID: 0 Internal Node ID: 0 Compute Unit: 12 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 32792028(0x1f45ddc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32792028(0x1f45ddc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 32792028(0x1f45ddc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1031 Uuid: GPU-XX Marketing Name: AMD Radeon RX 6750 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 3072(0xc00) KB L3: 98304(0x18000) KB Chip ID: 29663(0x73df) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2880 BDFID: 1536 Internal Node ID: 1 Compute Unit: 40 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 115 SDMA engine uCode:: 80 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1031 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done ***
Thats real juicy!
Make a note of the node number for your GPU device.
You can see mine is ‘1’.
Now you should have all the necessary stuff for compiling (assuming you have already installed a compiler)
When using the GPU to do ROCm stuff you need to be a member of the render group:
sudo usermod -a -G render yourusername
Now using git clone llama.cpp as follows
git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
Enter the llama directory, and compile using the following set HIP_VISIBLE_DEVICES=1 (the node value you took from rocminfo)
make clean && LLAMA_HIPLAS=1 && HIP_VISIBLE_DEVICES=1 make -j
Now after the compile is finished you need to do a little bit of tinkering to get this to work with your unsuported card.
ROCm will kick up an error that says it cannot find your device GX1031
so you need to set this GFX version number to the following:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
Ok next you need to select a Model:
Make sure you download a useable model I have used this one from huggingface:
https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF
Store the model in the models directory.
Now all you need to do is specify a prompt to use with the llama.cpp executeable you created:
export HSA_OVERRIDE_GFX_VERSION=10.3.0 && export HIP_VISIBLE_DEVICES=1 && sudo ./main -ngl 50 -m models/zephyr-7b-beta.Q2_K.gguf -p "How far does your knowledge of hyperplastic engineering go?"
llama is compiled to use your GPU secified earlier. Have fun guys.
llama output:
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, typical_p = 1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0 How far does your knowledge of hyperplastic engineering go? Do you know how the properties of materials change with plastic deformation? How many of you have encountered the problem of material anisotropy when it comes to working with metal or polymeric components in their production technology?
Using the Zephyr Model from above llama replied to my question with two more questions!! Now got to Part 2 to use Facebooks Offcial llama 2 model.
Here is how I got it finally working for my unsupported card:
Ok so basically I ended up installing two versions of ROCm first I installed 5.2 which worked for SD but not for llama.cpp I got this error for llama:
“hipErrorNoBinaryForGpu: Unable to find code object for all current devices!”
When I installed ROCm 5.2 I followed this guide:
https://askubuntu.com/questions/1429376/how-can-i-install-amd-rocm-5-on-ubuntu-22-04
This worked fine on Stable Diffusion. But not for llama.cpp giving his error:
“hipErrorNoBinaryForGpu: Unable to find code object for all current devices!”
After this I installed ROCm 5.7 using this command and using the debian repo:
sudo tee /etc/apt/sources.list.d/rocm.list <<’EOF’ deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main EOF
sudo amdgpu-install — rocmrelease=5.7.0 — usecase=rocm,hip — no-dkms
I think — no-dkms makes a big difference and to be fair if you just installed 5.7 it would work on all fronts
Oh yes and one more thing the Tensile library for 1030 I made a copy of the first and named it 1031 — for my gpu.
/opt/rocm/lib/rocblas/library/TensileLibrary_Type_HH_Contraction_l_Ailk_Bjlk_Cijk_Dijk_gfx1030.co
I renamed it.