Expanding the Boundaries of ChatGPT with LangChain - Part Two
In huge ChatGPT news this week, they rolled out web browsing capability and plugins for ChatGPT Pro subscribers. That will be what I blog about next week, but for now, I wanted to post part two of looking at LangChain.
In part one of this series, we explained that LangChain was a framework of connectors that allows us to use our code to tie together data and large language models (LLMs) like ChatGPT. The new ChatGPT browsing capabilities add a lot of functionality, but there is still a need for LangChain to allow us to automate these processes. We’ll look at increased functionality and use cases in future posts, but for now, I wanted to share the code I used in the previous post.
First things first are installing the Python dependencies:
pip install openai
pip install langchain
pip install chromadb
pip install tiktoken
I installed these successfully on both a Windows system and an M1 Mac OS system. On the Windows system, I needed to download some Microsoft C++ tools for ChromaDB and on Mac OS I got a dependency error, but it was easily fixed by installing a specific version urllib3 as shown below.
With those dependencies in place, I placed a text file in the directory called ws_odds.txt which has the current odds to win the baseball world series according to MGM.
I then created a file named openai_key.py which contains my OpenAI API key:
openai_key = "INSERT_YOUR_KEY_HERE"
With that in place, here is the code that performed the analysis, along with comments. The entire file can be viewed or downloaded from: https://github.com/azmatt/LangChainPart2
import os from typing import List from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.llms import OpenAI from langchain.chains import RetrievalQA from langchain.document_loaders import TextLoader from langchain.schema import Document from openai_key import openai_key # Set your OpenAI API key here os.environ['OPENAI_API_KEY'] = openai_key class ChatGPT: def __init__(self, file_path: str): # Initialize with the path to the file to be processed self.file_path = file_path # Load the text from the file using TextLoader. # Use utf8 so it doesn't break on non ascii characters self.loader = TextLoader(self.file_path, encoding='utf8') self.documents = self.loader.load() # Split the text into chunks self.texts = self._text_split(self.documents) # Embed the chunks of text self.vectordb = self._embed_texts(self.texts) # Initialize the GPT model with the embedded text # Retriever is generic interface that allows you to combine documents with large language models (LLMs) # Chain_type of "stuff" uses all of the text from the document. Other options include "map_reduce" to seperate texts into batches self.chatgpt = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=self.vectordb.as_retriever()) @staticmethod def _text_split(documents: List[Document]) -> List[Document]: # Splits the document into chunks of a specific size and overlap text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0) return text_splitter.split_documents(documents) @staticmethod def _embed_texts(texts: List[Document]) -> Chroma: # Embeds the text chunks using OpenAIEmbeddings embeddings = OpenAIEmbeddings() return Chroma.from_documents(texts, embeddings) def ask(self, query: str) -> str: # Send a query to ChatGPT and returns the response return self.chatgpt.run(query) if __name__ == "__main__": # Create a new instance of the ChatGPT class and ask it a question chatgpt = ChatGPT("ws_odds.txt") print(chatgpt.ask("can you show me the odds for five random teams and explain what the numbers mean"))