Answer: Using PDFbox to determine the coordinates of words in a document

I’m working on extract data from PDF files. This post helps me to determine for the coordinate position by word searching.

图片[1]-Answer: Using PDFbox to determine the coordinates of words in a document - 拾光赋-拾光赋
answer re: Using PDFbox to determine the coordinates of words in a document
Apr 12 ’15
2

take a look on this, I think it’s what you need.

https://jackson-brain.com/using-pdfbox-to-locate-text-coordinates-within-a-pdf-in-java/

Here is the code:

import java.io.File;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.util.TextPosition;

public class PrintTextLocations extends PDFTextStripper {

public static StringBuilder tWord


Open Full Answer

原文链接:Answer: Using PDFbox to determine the coordinates of words in a document

© 版权声明
THE END
喜欢就支持一下吧
点赞13 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容