Senin, 25 Juli 2011

PDF Text Extraction using PDFbox And iText

PDFBox
String path = "/Users/daniel/Temp/mypdf.pdf";
PDFTextStripper stripper = new PDFTextStripper();
PDDocument pdDoc = PDDocument.load(path);
StringWriter writer = new StringWriter();
stripper.writeText(pdDoc, writer);
System.out.println(writer.toString());

iText
String path = "/Users/daniel/Temp/mypdf.pdf";
PdfReader reader = new PdfReader(path);
int numberOfPages = reader.getNumberOfPages();
PdfTextExtractor extractor = new PdfTextExtractor(reader);
for (int i = 0;i System.out.println(extractor.getTextFromPage(i+1));
}

0 Comments:

Post a Comment