Export PDF Tables to Excel in ASP.Net using C# and VB.Net

Last Reply 4 months ago By pandeyism

Posted 4 months ago

Dear All,

I have tried with iTextSharp to extract tables from PDF. The challenge here is that with iTextSharp, we have to use the Rectangle concept and apply the co-ordinates for each cell. 

I want a dynamic solution which can extract tables from any PDF to Excel without specifyng any co-ordinates.

Please suggest and example or an open source library.

Thanks and Regards,

Vikash

You are viewing reply posted by: pandeyism 4 months ago.
Posted 4 months ago

Hi Vikash21,

From below link  you can download file and can use freely it.

https://www.sautin.com/download/pdf_focus_net.zip

And refer below link.

https://www.sautin.com/products/components/pdffocus/pdf-to-excel-programmatically-c-sharp.php

First in your project add the dll of SautinSoft.PdfFocus.dll , And then add namespace.

Please refer below sample.

Namespaces

C#

using SautinSoft;

VB.Net

Imports SautinSoft

Code

C#

protected void Page_Load(object sender, EventArgs e)
{
    string pdfFile = @"C:\Users\anand\Desktop\Id.pdf";
    string excelFile = System.IO.Path.ChangeExtension(pdfFile, ".xls");
    PdfFocus f = new PdfFocus();
    f.ExcelOptions.ConvertNonTabularDataToSpreadsheet = false;
    f.ExcelOptions.PreservePageLayout = true;
    f.OpenPdf(pdfFile);

    if (f.PageCount > 0)
    {
        f.ToExcel(excelFile);
    }
}

VB.Net

Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Handles Me.Load
    Dim pdfFile As String = "C:\Users\anand\Desktop\Id.pdf"
    Dim excelFile As String = System.IO.Path.ChangeExtension(pdfFile, ".xls")
    Dim f As PdfFocus = New PdfFocus()
    f.ExcelOptions.ConvertNonTabularDataToSpreadsheet = False
    f.ExcelOptions.PreservePageLayout = True
    f.OpenPdf(pdfFile)

    If f.PageCount > 0 Then
        f.ToExcel(excelFile)
    End If
End Sub

Screenshot