LLM Quantization in a nutshell

TL;DR This blogpost summarizes the buts and bolts of LLM quantization with llama.cpp. Introduction to Quantization The Technical Foundation of LLM Quantization Quantization, in the context of machine learning, refers to the process of reducing the precision of a model’s parameters, typically converting floating-point numbers to lower-bit representations. This has profound implications for model deployment, particularly in rendering sizable LLMs more accessible. Understanding Quantization Quantization works by mapping the continuous range of floating-point values to a discrete set of levels....

January 28, 2024 · 4 min

What is Parameter Efficient Finetuning?

TL;DR Parameter Efficient Fine-Tuning is a technique that aims to reduce computational and storage resources during the fine-tuning of Large Language Models. Why should i care? Fine-tuning is a common technique used to enhance the performance of large language models. Essentially, fine-tuning involves training a pre-trained model on a new, similar task. It has become a crucial step in model optimization. However, when these models consist of billions of parameters, fine-tuning becomes computational and storage heavy, leading to the development of Parameter Efficient Fine-Tuning methods....

November 1, 2023 · 11 min

Hands on with Retrieval Augmented Generation

TL;DR This blogpost shows an example for a Chatbot that uses Retrieval Augmented Generation to retrieve domain specific knowledge before querying a Large Language Model Hands on with Retrieval Augmented Generation For a primer on Retrieval Augmented Generation please read my other post What is Retrieval Augmented Generation?. Retrieval Augmented Generation can be a powerful architecture to easily built knowledge retrieval applications which (based on a recent study) even outperform LLM’s with long context windows....

October 7, 2023 · 5 min

What is Retrieval Augmented Generation?

TL;DR This blogpost tries to explain Retrieval Augmented Generation. Retrieval Augmented Generation is an Architecture used for NLP tasks which can be used to productionize LLM models for enterprise architecture easily. Why should i care? Intuitive would be to train a Large Language Model with domain specific data, in other words, to fine-tune the model-weights with custom data. picture from Neo4J But fine-tuning large language models (LLMs) is a complex and resource-intensive process due to several key factors:...

September 29, 2023 · 8 min

What are (Large) Language Models?

TL;DR A short summary on (Large) Language Models: What are the ideas and concepts behind Language Models? What are Language Models Large Language Models are rapidly transforming natural language processing tasks and have led to a huge hype around Artificial Intelligence and associated tasks. Especially with the introduction of GPT3.5(ChatGPT) in 2022 many people are interested in the capabilities of ML. But what is the foundation for the products that could impact our future world significantly?...

May 2, 2023 · 5 min