Save PDF as Text (txt) File in ASP.Net using C# and VB.Net

Last Reply 6 months ago By pandeyism

Posted 6 months ago

Dear All,

I know we can save a PDF as text manually. I was trying to save all pdfs in a directory to notepad files. 

Is there a way to automate this process?

Posted 6 months ago Modified on 6 months ago

Hey Vikash21,

Please refer below code.

Namespaces

C#

using System.IO;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

VB.Net

Imports System.IO
Imports iTextSharp.text.pdf.parser
Imports iTextSharp.text.pdf

Code

C#

protected void Page_Load(object sender, EventArgs e)
{
    ReadPdfFile();
}

public void ReadPdfFile()
{
    string path = @"D:\File\File.txt";
    StringBuilder sb = new StringBuilder();
    foreach (string file in Directory.GetFiles(@"D:\\Folder\", "*.pdf"))
    {
        PdfReader pdfReader = new PdfReader(file);
        for (int page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
            string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
            string[] theLines = currentText.Split('\n');
            foreach (var theLine in theLines)
            {
                sb.AppendLine(theLine);
            }
        }
    }
    File.WriteAllText(path, sb.ToString());
}

VB.Net

Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Handles Me.Load
    ReadPdfFile()
End Sub

Public Sub ReadPdfFile()
    Dim path As String = "D:\File\File.txt"
    Dim sb As StringBuilder = New StringBuilder()
    For Each file As String In Directory.GetFiles("D:\\Folder\", "*.pdf")
        Dim pdfReader As PdfReader = New PdfReader(file)
        For page As Integer = 1 To pdfReader.NumberOfPages
            Dim strategy As ITextExtractionStrategy = New SimpleTextExtractionStrategy()
            Dim currentText As String = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)
            Dim theLines As String() = currentText.Split(vbLf)

            For Each theLine In theLines
                sb.AppendLine(theLine)
            Next
        Next
    Next
    File.WriteAllText(path, sb.ToString())
End Sub

Screenshot

Pdf

Notepad