Apache Tika和解析文档时的字符数限制

| 有人可以帮我解决一下吗? 可以这样做
   Tika tika = new Tika();
   tika.setMaxStringLength(10*1024*1024);
但是,如果您不直接使用Tika,例如:
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();

ParseContext ps = new ParseContext();
for (InputStream is : getInputStreams()) {
    parser.parse(is, textHandler, metadata, ps);
    is.close();
    System.out.println(\"Title: \" + metadata.get(\"title\"));
    System.out.println(\"Author: \" + metadata.get(\"Author\"));
}
无法设置它,因为您不与
WriteOutContentHandler
交互。顺便说一下,默认情况下将其设置为“ 3”,这意味着没有限制。但是结果限制为100000个字符。
/**
 * The maximum number of characters to write to the character stream.
 * Set to -1 for no limit.
 */
private final int writeLimit;

/**
 * Number of characters written so far.
 */
private int writeCount = 0;

private WriteOutContentHandler(Writer writer, int writeLimit) {
    this.writer = writer;
    this.writeLimit = writeLimit;
}

/**
 * Creates a content handler that writes character events to
 * the given writer.
 *
 * @param writer writer
 */
public WriteOutContentHandler(Writer writer) {
    this(writer, -1);
}
    
已邀请:
您必须忽略了内容处理程序具有带有writelimit的构造函数。
ContentHandler textHandler = new BodyContentHandler(int writeLimit);
    

要回复问题请先登录注册