In this post, we build a Streamlit application that extracts text from an image using Tesseract OCR and processes it with LLaMA Vision 3.2 (via Ollama) to produce Markdown-formatted text.
Before running the program, ensure you have the following Python modules installed:
pip install streamlit
pip install pytesseract
pip install Pillow
pip install requests
Also note that base64
, io
, and subprocess
are part of the Python standard library.
import streamlit as st
import subprocess
from PIL import Image
import pytesseract
import base64
This section imports the required modules for our application. We use Streamlit for the web interface, Pillow for image processing, pytesseract for OCR, subprocess to call the external Ollama command, and base64 for generating a download link.
st.title("OCR and Markdown Processing with LLaMA 3.2 Vision")
st.write("Upload an image to extract text via Tesseract OCR and process it with LLaMA Vision 3.2 for Markdown conversion.")
Here we define the title and brief description of the app that will be displayed on the web page.
uploaded_image = st.file_uploader("Upload Image", type=["jpg", "jpeg", "png"])
This widget allows users to upload an image file (JPG, JPEG, PNG) for processing.
if uploaded_image is not None:
image = Image.open(uploaded_image)
st.image(image, caption="Uploaded Image", use_container_width=True)
st.subheader("Extracted Text:")
extracted_text = pytesseract.image_to_string(image, lang="eng")
st.write(extracted_text)
After uploading, the image is displayed on the page. Then Tesseract OCR is used to extract text, which is shown as plain text below the image.
if st.button("Process Text with LLaMA 3.2"):
with st.spinner("Processing text with LLaMA 3.2..."):
try:
command = ["ollama", "run", "llama3.2-vision:11b", extracted_text]
result = subprocess.run(command, capture_output=True, text=True)
if result.returncode != 0:
st.error(f"Ollama Error: {result.stderr}")
else:
markdown_text = result.stdout.strip()
st.subheader("Processed Markdown Text:")
st.write(markdown_text)
# Provide a download link for the Markdown output
b64 = base64.b64encode(markdown_text.encode()).decode()
download_link = f'Download Markdown File'
st.markdown(download_link, unsafe_allow_html=True)
except Exception as e:
st.error(f"Error during processing: {e}")
When the user clicks the button, the app sends the extracted text to the Ollama LLaMA 3.2 model for processing. The resulting Markdown text is then displayed and a download link is generated for the user to save the output.
import streamlit as st
import subprocess
from PIL import Image
import pytesseract
import base64
# Title and description
st.title("OCR and Markdown Processing with LLaMA 3.2 Vision")
st.write("Upload an image to extract text via Tesseract OCR and process it with LLaMA Vision 3.2 for Markdown conversion.")
# File uploader for image input
uploaded_image = st.file_uploader("Upload Image", type=["jpg", "jpeg", "png"])
if uploaded_image is not None:
# Open and display the uploaded image
image = Image.open(uploaded_image)
st.image(image, caption="Uploaded Image", use_container_width=True)
# Step 1: Perform OCR on the image using Tesseract
st.subheader("Extracted Text:")
extracted_text = pytesseract.image_to_string(image, lang="eng")
st.write(extracted_text)
# Step 2: Process the extracted text with Ollama LLaMA 3.2 for Markdown formatting
if st.button("Process Text with LLaMA 3.2"):
with st.spinner("Processing text with LLaMA 3.2..."):
try:
command = ["ollama", "run", "llama3.2-vision:11b", extracted_text]
result = subprocess.run(command, capture_output=True, text=True)
if result.returncode != 0:
st.error(f"Ollama Error: {result.stderr}")
else:
markdown_text = result.stdout.strip()
st.subheader("Processed Markdown Text:")
st.write(markdown_text)
# Provide a download link for the Markdown output
b64 = base64.b64encode(markdown_text.encode()).decode()
download_link = f'Download Markdown File'
st.markdown(download_link, unsafe_allow_html=True)
except Exception as e:
st.error(f"Error during processing: {e}")
This is the complete code for the Streamlit application.
streamlit run image2markdown.py
Open your terminal, navigate to the project directory, and run the command above.
Once the app starts, view it in your browser:
Local URL: http://localhost:8501
Network URL: http://192.168.1.11:8501
This application demonstrates how to integrate Tesseract OCR with the Ollama LLaMA 3.2 Vision model using Streamlit. By uploading an image, extracting text from it, and processing that text into Markdown, the app showcases a powerful combination of computer vision and natural language processing techniques in an easy-to-use web interface.