Equations Extraction from PDF

About: In this project, we are working on equations extraction from digitally generated PDFs. We are using parser to extract various information from the PDF and then doing various analyses including Unicode correction, Actual bounding box extraction, and syntactic analysis.

Student: Pratibha (MTP 2019), Chrystle Myrna Lobo (MiniP 2018), Abhilash Ramteke (COP315 2018)

Text/Math Classifier

About: In this project, we are working on a classifier to classify the sequence of characters from a digitally generated PDF document into following three categories: Text, Mathematics, and Text inside the Mathematics. This classifier is required to provide appropriate tags to different parts of the document's content.

Student: Deepak Bhatt (MTP 2019)

Adapting tesseract for mathematical content

About: Project focuses on adaptation of tesseract (Open source OCR from Google Inc) for recognizing the mathematical equations. This projects assumes that the input image contains only mathematical equation, any other type of textual or non-textual content is not available.

Students: Saurabh Sharma (MTP, 2017)