I built my own chatbot
Using my chatbot I can ask questions on the material contained on a selected list of web pages of an arbitrary number
In November 2023, I started this new section, AI STUFF. The AI STUFF posts will alternate with my regular posts. The AI STUFF posts will be in English and Turkish. They are written in English and translated to Turkish. Both will be on the same page.
Kasım 2023 den itibaren yeni bir bölüme başlıyorum: AI STUFF. AI STUFF gönderileri normal gönderilerimle dönüşümlü olarak yayınlanacak. Sadece İngilizce olacak çünkü bu konu ile ilgileniyorsanız, İngilizce bilmeniz zaten şart. Aksi halde verdiğim referansları takip etmeniz ve yaptığımı tekrar etme şansınız az. Ayrıca teknik bir metini Türkçeye çevirmek kolay olmuyor. Ona ayıracağım vakti, İngilizcesini daha iyi yazmaya harcamaya daha olur diye düşündüm.
A month ago I tried to build a chatbot but it did not work. It was able to answer a single question but could not maintain a conversation because it forgot the history of the chat. To understand where I was going wrong, I had a deep dive into the OpenAI Playground in the last post. This was very useful and if you are seriously interested in this topic, I strongly recommend that your review and repeat my playground exercise referred to in that post. Based on what learned, I was able to modify my qa.ipynb
to create the new notebook qathread.ipynb
.
I can now have a long conversation along a single theme. The chatbot remembers the past questions and also does not hallucinate. It does not invent answers if the files I attach do not contain the information that is required to answer the query.
An Example Chat with my Chatbot
I copy the questions and answers during a chat I had with my new qathread
chatbot. But first, understand the following steps:
How to start the notebook — Explained here in my QandA post.
How to read my blog pages into separate text files and upload all of these text files to my OpenAI account — Explained here in the QandA post.
How to generate embeddings (these are stored on my local disk) — Explained here.
Starting the chat
After the above three steps are done, we are ready to start the chat. I am running the notebook in VS Code, so the prompt appears in a pop-up box at the top of the window. Before entering the question though, I enter answer:
This puts the chatbot into answering
mode. Otherwise, it would simply retrieve and list files but would do nothing else (which is the check
mode, which is the starting mode). After the chatbot goes to answering mode, I start asking.
I ask the first question:
Q1 — What is the local government debt in China?
The notebook locates three pages as the most relevant texts:
Actually, the answer to my question is in the post on the first row. But the notebook will ask OpenAI Assistant to answer the question using the information in these three files. The answers to the follow-on questions will also be sought from these files. The number of three is what I set in the code. It is an arbitrary number and you may try different numbers.
Here is the chatbot answer to Q1:
Q2 — The local government debt in China in the year 2020 approached 90 trillion yuan, which is approximately 12.49 trillion U.S. dollars, and this equated to 88% of the country's GDP at the time【13†source】.
This is the correct answer. The rest of the Q and A session, I list below without any commentary:
Query : Who did this study?
ANSWER : The study on China's local government debt was conducted by Professor Li and his PhD student【13†source】.
Query : What is Professor Li's university?
ANSWER : I don't know.
Query : Professor Li is employed by a Chinese university. What is this university?
ANSWER : Professor David Daokui Li is affiliated with the School of Economics and Management of Tsinghua University, where he holds the position of the Mansfield Freeman Chair of the Department of Finance【43†source】.
Query : What is the per capita debt and how does it compare against salaries?
ANSWER : The per capita debt in China, based on a total debt of 90 trillion yuan, is 75,000 yuan per person for the 1.2 billion Chinese living in the Republic of China. To give context to this amount, it was mentioned that this figure of 75,000 yuan was the annual salary of a typical young mechanical engineer in Luoyang in 2019【53†source】.
Query : WHat is the result of the last Taiwanese election?
ANSWER : During the last Taiwanese election held on Saturday, 13 January, the current president was re-elected. Lai, presumably the opposing candidate, received only 40% of the votes in Taiwan's first-past-the-post system, and his party, the Democratic Progressive Party (DPP), which is pro-independence, lost its parliamentary majority【0†source】.
Query : Is Lai pro-independence?
ANSWER : Yes, Lai and his party, the Democratic Progressive Party (DPP), are pro-independence【0†source】.
Query : What book did win the 2023 Pulitzer Prize for Fiction?
ANSWER : The 2023 Pulitzer Prize for Fiction was shared by Hernan Diaz's "Trust" and Barbara Kingsolver’s "Demon Copperhead"【0†source】.
Query : What is the book "Trust" about?
ANSWER : The book "Trust" has a complex plot that isn't fully revealed until past its midpoint, transcending a mere critique of capitalism. It's structured as separate books, each with their own plot. The first section, titled "Bonds," centers on Benjamin Rask, a wealthy American financier with a reputation for being a ruthless stock market wizard. He is married to a woman who is interesting and talented in her own right. The theme seems to delve into the relationship between Mr. and Mrs. Rask, with the narrative written in the third person, exploring the disparities between American and European attitudes and sensibilities. The richness of detail and the focus on wealth and the lives of the wealthy also suggest a significant emphasis on social and personal dynamics within the context of financial power【1†source】.
Query : What color are the flowers of the Jacaranda trees?
ANSWER : I don't know.
Query : How is hydrogen produced?
ANSWER : I don't know.
Query : WHat is a typical electrolyser efficiency?
ANSWER : A typical electrolyzer efficiency is around 55% under the assumption that the cost of electricity is the only production cost. This calculation may underestimate the efficiency if there are significant costs in addition to the cost of electricity【120†source】.
Why is this different from simply running a ChatGPT session?
Some of you may say, “what is the big deal?”, I can do this now with ChatGPT. I do not need a separate program to go in between.
What I do in qathread.ipynb
is different because:
The chatbot answers are extracted from my blog posts (uploaded as text files), not from arbitrary unknown places on the internet
If the information is not on my blog, the chatbot simply says “I don’t know”.
As a result, the answers are unlikely to include hallucinations
I use my blog posts for demonstration but you can repeat this with any collection of web pages, PDF files or any other text files. Try it and ask me if you have difficulties.
The following section tells you how to copy the code of the notebook to your local computer so that you can repeat what I do. You would need an OpenAI account to run the chatbot.
Download the files and run them on your computer
I am using Python and the OpenAI API. In addition, I created an X (formerly known as Twitter) Community Group 'Building AI Tutors'. If you are interested, please join the group. I will be posting my progress there as well.
You do not need to be an experienced Python programmer but some knowledge of Python will be helpful. Below I list the steps you need to take to join me and explain how to go about it:
Install VS Code from https://code.visualstudio.com/download. This is a free code editor and development environment. It is the tool I am using therefore should be able to help you if you have problems.
Install Python from https://www.python.org/downloads/. At the time I started this notebook,
Download my
probot
files from my account on github. I do not think you need to have a github account to do this. The name of the project is probot. You can download the wholeprobot
folder as a zip pack and unzip to your local disk
Python ve OpenAI API kullanıyorum. Ek olarak, bir X (eski adıyla Twitter) Topluluk Grubu 'Yapı Yapay Zeka Eğitmenleri' oluşturdum. Eğer ilgileniyorsanız lütfen gruba katılın. Gelişmelerimi orada da paylaşacağım.
Deneyimli bir Python programcısı olmanıza gerek yok ancak biraz Python bilgisi faydalı olacaktır. Aşağıda bana katılmak için atmanız gereken adımları listeliyorum ve bunu nasıl yapacağınızı açıklıyorum:
VS Code'u https://code.visualstudio.com/download adresinden yükleyin. Bu ücretsiz bir program ve benim kullandığım araçtır, dolayısıyla sorun yaşamanız durumunda size yardımcı olabilirim.
Python'u https://www.python.org/downloads/ adresinden yükleyin.
Tüm program ve detaylı anlatımları github'daki hesabımdan indirin. Bunu yapmak için bir github hesabınızın olması gerektiğini sanmıyorum. Projenin adı probot. Probot klasörünün tamamını zip paketi olarak indirebilir ve zip dosyasını yerel diskinize açabilirsiniz