OpenAI astounds us again
One year after the release of ChatGPT, OpenAI comes up with substantial improvements to their product.
Summary
OpenAI announced on November 7, 2023, new models and developer products:
GPT-4 Turbo with a 128K context window.
New Assistants API for creating domain-specific AI products.
Multimodality capabilities
Using the Assistants API:
I now can create a ChatGPT on specific content.
The process involves preparing and uploading a text file
To populate the file, I:
Developed a Python script with Co-Pilot in VS Code to read a web page, extract text, and save it between two string markers into a file.
I uploaded the text file and tested my custom ChatGPT:
The AI successfully answered questions based on the post's content, without inventing information not present in the text.
I am interested in using LLMs for creating specialized software tutors:
The concept includes an indexed database of blog posts to generate relevant responses to user queries using embedding vectors.
I will write on Embeddings in the future
OpenAI developments and leadership changes:
The CEO of OpenAI, Sam Altman, was dismissed by the board.
The OpenAI board, unlike typical corporate boards, consisted of co-founders and a few other individuals from diverse backgrounds.
The sacking of Altman followed by Greg Brockman's resignation may have been due to a major disagreement resolved by the board's decision.
-+-+-+-+
On 7 November 2023, OpenAI announced new models and developer products. These include:
The new and improved GPT-4 Turbo with a 128K context window
New Assistants API, which I demonstrate below
Multimodality, including vision, image creation (DALL·E 3), and text-to-speech (TTS), which I have not tried yet.
Using Assistants API
This is a phenomenal product, which makes it very easy to develop Generative AI products using your own domain knowledge. Let me explain what this means.
I tried the Assistants API product, to build my own ChatGPT that only answers questions about the contents of my last post: Why did Elon Musk buy Twitter? I know this is not very useful but I wanted to try it as a demonstration. The way to do it is
Put the contents of my post into a file
Upload the file to the Assistants API
Instruct the Assistant that
it should answer questions based on the contents of the uploaded file.
If the file does not include information relating to the question, then it should say that there is no information.
As you can see in the image below, Assistants API has three tools available: This task uses the “
Retrieval
” tool.
-+-+-+-+
Read the post into a text file
Reading the post into a file was easier than I thought. I could copy and paste it manually of course but I wanted to do it automatically by software. I started a Co-Pilot window in VS Code and gave the following instructions:
Write a python script, rdweb(url, s1, s2), that will read the web page in `url`; remove all html tags and other non-text characters; return the section between the strings s1 and s2 as a text string
The co-pilot generated the function that worked without an error at the first go. Then I asked it to modify the function as
Add a filename argument and modify the function to save the `result` string in the specified file.
The co-pilot modified the first function. I copy this final version below:
import requests
from bs4 import BeautifulSoup
def rdweb(url, s1, s2, filename):
# get the webpage content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# remove all html tags and other non-text characters
text = soup.get_text()
# find the section between the strings s1 and s2
start_index = text.find(s1) + len(s1)
end_index = text.find(s2)
result = text[start_index:end_index]
# save the result in the specified file
with open(filename, 'w') as f:
f.write(result)
I called this function as follows:
url="https://halimgur.substack.com/p/why-did-elon-musk-buy-twitter"
s1="-+-+-+-+"
s2="._._._._"
filename="elon.txt"
rdweb(url, s1, s2, filename)
Upload the file
Upload the file elon.txt
thus created to the Assistants Playground. You can see the file uploaded at the bottom of the image above.
-+-+-+-+
Ask Questions
My first question was relatively easy but still non-trivial:
In my post I have several references to “Elon Musk” but nowhere I explicitly say that “Elon” is his first name. By the way, “Coder” is how I named my chatbot the first time. Later I changed that name to “Elon_Post”.
My second question was trickier:
I was pleased that it was able to answer this question too:
Here is more interaction with the chatbot I just created. Also note that this is after I changed the chatbot’s name to Elon_Post.
and even more:
Note the answer to the last question. In an earlier post I explained why Musk renamed Twitter to X but there was no mention on it in this post. The chatbot did not make up an imaginary answer and truthfully responded that “Information cannot be found” in the uploaded file.
-+-+-+-+
I asked three more questions:
I asked the first question to check whether it would do the arithmetic and answer 6000. It chose to provide both numbers, 8000 and 2000.
You can compare the second answer with the paragraph I had in the post. The answer is clearly from that paragraph but paraphrased and slightly shortened.
The last question was another test to see if I would get the chatbot hallucinate, i.e. invent an answer when the information is not included in the file. It did not hallucinate.
-+-+-+-+
The Future
My interest in the LLMs (Large Language Models) is to use them to produce software tutors in specific areas. Teaching engineering students, teaching medical students would be a step towards having LLM Engineers, LLM Doctors. When ChatGPT was first released, I wrote that all of these were coming. Today the tools are here if not the final products.
In the demo above, I uploaded the text of one blog post and asked questions from that post. It is possible to expand this into a chatbot for all my blog posts. This would work as follows:
An indexed database of all my posts is generated.
This is done by creating separate individual “embedding vectors” of each post and storing them as such. This is done only once.
The user asks a question
The question text is converted to an embedding vector using the same embedding model used in the previous step
This embedding vector is compared against the array of embeddings in the database.
This is done by calculating the distance from the question embedding to each of the post embeddings.
There are different methods to calculate the distance. The OpenAI recommends using dot product.
The minimum distance gives us the post most relevant to the question
The text of the identified post is sent to ChatGPT Assistant along with the question as I did in the previous section.
The ChatGPT answer is displayed to the user
The “embedding vector” is used in a few places above. I know this term may not mean much to most readers. As you may have noticed, I added a new tag to Halim’in Günlüğü: AI STUFF. I will later do a post under that heading dedicated to Embeddings.
Short Takes
-+-+-+-+
OpenAI astounds us again … and again
While I was finalising the post, OpenAI surprised us again. The Board of Directors lost confidence and sacked the CEO Sam Altman. The OpenAI Board is not like the Managing Board of other companies of value similar to OpenAI ($80b). Any other company worth close to $100b would have a Board full of lawyers, accountants and people representing the investors and the partners of the company. The OpenAI Board had only the following people:
Greg Brockman (Chair)
Ilya Sutskever, also OpenAI chief scientist
Adam D'Angelo, CEO of search query site Quora
Helen Toner, an academic and director at Georgetown University's Center for Security and Emerging Technology
Tasha McCauley, a technology entrepreneur but better known as the actor Joseph Gordon-Levitt's wife
Helen Toner is a young academic. I read her China diary notes from five years ago.
Brockman, Sutskever and Altman were co-founders of OpenAI. Greg Brockman resigned shortly after Altman’s sacking. Rather than a palace coup by Sutskever, I think this was an extreme form of conflict resolution. Obviously, there was a difference of opinion between Ilya and Sam and Adam, Helen and Tasha sided with Ilya. Now the conflict is resolved and decision is made on whatever it was about, it is even possible that Altman goes back.
-+-+-+-+
You Tube
This week, I recommend the Lex Friedman chat with Professor John Mearsheimer:
Mearsheimer is one of the few sane voices coming out of US academia these days. Another is Professor Jeffrey Sachs. All the others seem to be either feeding into the military industrial narrative or spruiking incoherent drivel. I am sure there are others too who say the right thing but Mearsheimer and Sachs are the ones whom everyone seems to respect even though they do not necessarily accept what they say. I think my views are closer to Sachs’, who is not as pessimistic as Mearsheimer is.
-+-+-+-+
Diary
I am going to have an inguinal hernia operation in three months. I have been feeling since sometime ago two small swellings near my groins protruding when I stand up. They disappear when I lie down. The inguinal hernia means that a part of the intestine is pushing through a weak spot in the abdominal muscles near the groin area, creating a bulge.
It is not uncommon. You may have had this operation or at least know someone who did. If left untreated, they tell me it may lead to serious complications. The surgeon I am referred to by my GP advises that surgical intervention is the best course of action and better sooner than later (and better have it before getting older)..
The surgery will be laparoscopic. This means that instead of making a large incision, the surgeon will make a few small incisions through which a laparoscope and surgical instruments will be inserted. The laparoscope is equipped with a camera, allowing the surgeon to view the internal structures on a monitor and perform the repair with precision. This minimally invasive approach hopefully means less postoperative pain, shorter recovery time, and minimal scarring compared to traditional open surgery.
Although the operation is minimally invasive, I still will need to stay at the hospital for one night. I suppose this is because I will be under total anesthesia during the operation, and at my age, close to 70, they would like to monitor me for a few hours afterward just in case. Apparently, older patients are more susceptible to complications from anesthesia and surgery.
-+-+-+-+
I was in the City last week to buy Lapsang Souchong tea.
I walked into a protest walk by the construction union CFMEU about safety standards and site conditions after two recent incidents;
the worker falling at Cross River Rail
the young lad (17 years old) that was fatally injured in a fall on a West End site.
I do not know the details of these incidents so I will not comment. It is very sad that people continue dying at construction sites.
In addition to LapsangSouchong, I wanted to try another kind of black tea. I was thinking Darjeeling but the shopkeeper recommended Yunnan Fop. She thought I would like it better since I liked Lapsang Souchong.
It indeed has a nice fresh taste. Because it is not bitter, I can drink it any time of the day on its own.
-+-+-+-+
What I read
Last week I read this book:
I learned a lot about the recent history of Algeria even though this is crime fiction.
A British youth in Paris in early sixties accidentally witnesses the 17 October 1961 massacre. This is an extraordinary criminal act by the French Government. The wikipedia describes it as follows:
The Paris massacre of 1961[a] (also called the 17 October 1961 massacre[b] in France) was the mass killing of Algerians who were living in Paris by the French National Police. It occurred on 17 October 1961, during the Algerian War (1954–62). Under orders from the head of the Parisian police, Maurice Papon, the National Police attacked a demonstration by 30,000 pro-National Liberation Front (FLN) Algerians. After 37 years of denial and censorship of the press, in 1998 the government finally acknowledged 40 deaths, while some historians estimate that between 200 and 300 Algerians died.[3][4] Death was due to heavy-handed beating by the police, as well as mass drownings, as police officers threw demonstrators into the river Seine.
The youth then gets mixed up with a couple of Algerian militants, who come into influential positions after Ben Bella is assassinated and Boumedyen takes over. I will not give away the plot by telling anymore but there is also a crusty old police Algiers superintendent, Talep, who lived through the revolution, the civil war between the military and the islamists, and what comes after. He is now close to retirement but his boss gives him an assignment that unravels into an amazing story.
From the book:
‘How old were you at independence, Taleb?’ Bouras asks, as if genuinely interested in the answer. ‘Seven, Director.’ ‘Do you remember the day?’ ‘I remember my father smiling. It was not a common sight.’ ‘Did he smile more often after independence?’ ‘Less, if anything.’ ‘So, there we have it. The history of the republic encapsulated in the history of your father’s smile.’ Bouras broods on this thought for a moment, then says, ‘Do you believe in the existence of hizb fransa, Taleb?’ ‘No, Director, I don’t.’ Unlike many of his countrymen, Taleb gives no credence to the conspiracy theory that the French left a fifth column of saboteurs behind them when they left in 1962 – hizb fransa, the ‘party of France’ – dedicated to undermining the new republic in any way they could. He believes that if de Gaulle had been able to pull that off, the old fox would surely have found a way to prevent independence altogether.
Goddard, Robert. This is the Night They Come For You: A TIMES THRILLER OF THE YEAR (pp. 4-5). Transworld. Kindle Edition.
These are the other Goddard books on my Kindle.
“In Pale Battalions” was the book I read by Goddard years ago. I liked it so much I bought a Kindle edition to read it again.
-+-+-+-+
Pascal and Hagi
I tried a new thing today. I video taped me listening to Eşkiya Dünyaya Hükümdar Olmaz (the version which is the title for the TV series of a few years ago); and posted it on You Tube. Here it is:
-+-+-+-+
Zika Statistics
Continuing to monitor the height and the weight of the sourdough rye I buy from the brilliant Zika Pastries.
Halim merhabalar,
Yıllar sonra türkü söylerken kısmen de olsa sesini duymak çok güzeldi...
I am getting addicted reading your posts every week with loads of interesting and useful information. Please keep-up!
Best wishes for quick recovery for your coming operation. (I did not know the common "fıtık" in Turkish had such a fanciful name!)
Best regards,
Selçuk