Getting Started Guide 2024 to LangChain

In daily life, we mainly focus on building end-to-end applications. There are many automated machine learning platforms and CI/CD pipelines that can be used to automate our machine learning processes.

Nowadays, with the emergence of large models, if we want to use OpenAI or hugging face to create an LLM application, deploy and use it locally, but the manual installation of large models is too cumbersome, and involves environment, compilation interface integration issues Too much,

🚀 In contrast, LangChain simplifies the integration and development process of LLM models, providing higher development efficiency and ease of use while maintaining scalability and flexibility. It enables developers to focus more on the business logic of the application and to achieve the same, it helps us create end-to-end LLM model applications or processes. Let’s learn more about Langchain.

What is Langchain?

LangChain is a development framework for developing language model-based applications. In general, LangChain is an intermediate layer between user-oriented programs and LLM.

LangChain can easily manage interactions with language models, link multiple components together, and integrate additional resources such as APIs and databases. Its components include models (various types of LLM), prompt templates (Prompts), indexes, agents, memories, etc.

When it is mentioned that LangChain is an intermediate layer between user-oriented programs and LLM, it means that LangChain provides a bridge that connects user-written applications and the underlying large language model (LLM). It acts as an intermediary layer, allowing users to interact with LLM more conveniently. Normally, when users write applications, they need to consider how to communicate, call and process results with the underlying LLM.

This may involve interfaces with LLM, data transmission, request and response processing, etc. LangChain’s goal is to simplify this process and provide a unified interface and functionality to enable users to easily build, manage and integrate LLM applications. Through LangChain, users can organize and manage the LLM calling process through a series of modules.

Just like the pytorch and tensorflow frameworks, users realize the operation of underlying operators by calling the functions and interfaces of the framework.

Langchain consists of several modules, each module is connected into a chain, and finally the chain is used to call all modules at once.

These modules include the following:

Model
Prompt
Memory
Chain
Agents and tools
Document Loaders
Indexes

Install Langchain Python

pip install langchain

Model

In terms of model selection, the models mainly cover large language models (LLM). A large language model of considerable size is composed of a neural network with a large number of parameters and is trained on a large amount of unlabeled text.

There are LLMs developed by various technology giants, such as:

Google’s BERT
OpenAI’s GPT-3
Google LaMDA
GooglePaLM
Meta AI’s LLaMA
OpenAI’s GPT-4

LangChain provides a common interface to many different LLMs.

Most of them work through their API, but you can also run local models.

Openai

pip install openai


import os

os.environ["OPENAI_API_KEY"] ="YOUR_OPENAI_TOKEN"

from langchain.llms import OpenAI


llm = OpenAI(temperature=0.9)  # model_name="text-davinci-003"

text = "What would be a good company name for a company that makes colorful socks?"

print(llm(text))

Huggingface

pip install huggingface_hub

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "YOUR_HF_TOKEN"

from langchain import HuggingFaceHub


llm = HuggingFaceHub(repo_id="google/flan-t5-xl", model_kwargs={"temperature":0, "max_length":64})

llm("translate English to Arabic: How old are you?")

Prompts

Prompts are inputs we provide to the system to refine our answers to make them more accurate or specific based on our use case.

Many times, you may want more structured information than plain text. Many novel object detection and classification algorithms based on contrastive pre-training and zero-shot learning take cues as valid result inputs. For example, OpenAI’s CLIP and META’s Grounding DINO both use hints as input for prediction.

In Langchain, we can convert the demand prompts we provide into the standard prompt format, which means optimizing prompts.

And LangChain provides pre-designed prompt templates that can generate prompts for different types of tasks. However, in some cases, the preset templates may not meet your requirements. By default, we can use a custom prompt template.

For example:

from langchain import PromptTemplate
# This template will act as a blue print for prompt

template = """
I want you to act as a naming consultant for new companies.
What is a good name for a company that makes {product}?
"""

prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)
prompt.format(product="colorful socks")
# -> I want you to act as a naming consultant for new companies.
# -> What is a good name for a company that makes colorful socks?

Memory

In LangChain, Chains and Agents run in stateless mode by default, which means they handle each incoming query independently. However, in some applications, such as chatbots, it is very important to retain previous interactions, both in the short and long term. This is where the concept of “memory” comes into play.

Keeping a record of historical queries
LangChain provides memory components in two forms. First, LangChain provides auxiliary tools for managing and manipulating previous chat messages, which are designed to be modular and useful regardless of the use case. Second, LangChain provides a way to easily integrate these tools into the chain. This makes them very flexible and adaptable to any situation.

For example:

from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()

history.add_user_message("hi!")

history.add_ai_message("whats up?")

history.messages

Output :

[HumanMessage(content='hi!', additional_kwargs={}), AIMessage(content='whats up?', additional_kwargs={})]

Chains

Chains provide a way to combine various components into a unified application. For example, one could create a chain that receives user input, formats it using a PromptTemplate, and then transmits the formatted reply to the LLM. More complex chains can be generated by integrating multiple chains with other components.

Similar to the principle of nn.Sequential() in deep learning network

LLMChain is considered one of the most widely used methods for querying LLM objects. It formats the provided input key value and memory key value (if present) according to the prompt template, and then sends the formatted string to the LLM, which produces the output and returns it.

#Here we are chaining everything
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)
human_message_prompt = HumanMessagePromptTemplate(
        prompt=PromptTemplate(
            template="What is a good name for a company that makes {product}?",
            input_variables=["product"],
        )
    )
    
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])
chat = ChatOpenAI(temperature=0.9)

# Temperature is about randomness in answer more the temp, random the answer
#Final Chain

chain = LLMChain(llm=chat, prompt=chat_prompt_template)
print(chain.run("colorful socks"))

Agents and Tools

In LangChain, agents refer to a mechanism that can make decisions and perform corresponding operations based on user input. The proxy acts as a middle layer for interacting with users and processing user requests. It can decide whether to call specific tools or modules based on user input, and determine the input passed to those tools. After the agent completes the operation, it observes the results and decides the next action based on the results.

The role of the agent is to enhance the flexibility and adaptability of LangChain. It enables LangChain to dynamically choose which tools to call based on the specific situation and adjust the processing flow according to the actual situation. The use of agents can make LangChain more powerful and intelligent when handling complex tasks and diverse user needs. Through agents, LangChain can make decisions based on specific situations and provide more accurate and personalized responses to meet the needs of different users.

It just tells the model what type of task this task is.

Tools are functions that an agent can use to interact with external libraries. These tools can be general-purpose utilities (such as search). For example, you can use the following code snippet to load the tool:

from langchain.agents import load_tools
from langchain.agents import initialize_agent


pip install wikipedia

from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["wikipedia", "llm-math"], llm=llm)

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)


agent.run("In what year was the film Departed with Leopnardo Dicaprio released? What is this year raised to the 0.43 power?")

Document Loaders

Use language models with your own text data. The first step in the process is to load the data into the document (i.e. some snippets of text).

from langchain.document_loaders import NotionDirectoryLoader

loader = NotionDirectoryLoader("Notion_DB")

docs = loader.load()

Index

Indexing refers to structuring documents in the best way so that language models (LLMs) can optimally interact with them. This module contains utility functions for processing documents.

Embeddings: Embeddings are numerical representations of information (such as text, documents, images, audio, etc.). Embedding allows information to be converted into vector form so that computers can better understand and process it.

Text Splitters: When you need to process longer text, it is necessary to split the text into multiple chunks. Text splitter is a tool for splitting long text into smaller pieces.

Vectorstores: Vectorstores store and index vector embeddings from natural language processing models to understand the meaning and context of text strings, sentences, and entire documents, resulting in more accurate and relevant search results. See available vector databases.

import requests

url = "https://yourskillsPathURL/mbenhassineskills.txt"
res = requests.get(url)
with open("mbenhassineskills.txt", "w") as f:
  f.write(res.text)

# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./mbenhassineskills.txt')
documents = loader.load()

# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

pip install sentence_transformers

# Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

#text = "This is a test document."
#query_result = embeddings.embed_query(text)
#doc_result = embeddings.embed_documents([text])


pip install faiss-cpu

from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

query = "Did Mohamed have some soft skills ?"
docs = db.similarity_search(query)

print(docs[0].page_content)