我刚刚从把一个PDF文档解析出来后,是下面的样子:
Progressive Result Generation for Multi-Criteria Decision Support Queries ->这里人工输入的回车换行
Venkatesh Raghavan (Worcester Polytechnic Institute), Elke Rundensteiner (Worcester ->这里行满后自动形成的回车换行
Polytechnic Institute)Personalized Web Search with Location Preferences ->这里人工输入的回车换行
Kenneth Wai-Ting Leung (HKUST), Dik Lun Lee (HKUST), Wang-Chien Lee (Pennsylvania
State University) ->这里人工输入的回车换行Q-Cop: Avoiding Bad Query Mixes to Minimize Client Timeouts Under Heavy Loads ->这里人工输入的回车换行
Sean Tozer (University of Waterloo), Tim Brecht (University of Waterloo), Ashraf Aboulnaga
(University of Waterloo) ->这里人工输入的回车换行Explaining Structured Queries in Natural Language ->这里人工输入的回车换行
Georgia Koutrika (Stanford University), Alkis Simitsis (HP Labs), Yannis Ioannidis (University of
Athens)
-> 比如这里有个换页符,我怎么才能把它赵出来,并进行判断.我刚才把所有的单个字符串都split的出来,但我想要得是红色部分的换页符,和人工在文本里强制回车符.麻烦大家帮个忙,给个建议,有例子最好.
Progressive Result Generation for Multi-Criteria Decision Support Queries ->这里人工输入的回车换行
Venkatesh Raghavan (Worcester Polytechnic Institute), Elke Rundensteiner (Worcester ->这里行满后自动形成的回车换行
Polytechnic Institute)Personalized Web Search with Location Preferences ->这里人工输入的回车换行
Kenneth Wai-Ting Leung (HKUST), Dik Lun Lee (HKUST), Wang-Chien Lee (Pennsylvania
State University) ->这里人工输入的回车换行Q-Cop: Avoiding Bad Query Mixes to Minimize Client Timeouts Under Heavy Loads ->这里人工输入的回车换行
Sean Tozer (University of Waterloo), Tim Brecht (University of Waterloo), Ashraf Aboulnaga
(University of Waterloo) ->这里人工输入的回车换行Explaining Structured Queries in Natural Language ->这里人工输入的回车换行
Georgia Koutrika (Stanford University), Alkis Simitsis (HP Labs), Yannis Ioannidis (University of
Athens)
-> 比如这里有个换页符,我怎么才能把它赵出来,并进行判断.我刚才把所有的单个字符串都split的出来,但我想要得是红色部分的换页符,和人工在文本里强制回车符.麻烦大家帮个忙,给个建议,有例子最好.
*
* @
*/
public class Main { /**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
int ch;
try{
FileInputStream inputs= new FileInputStream("my.txt");
ch=inputs.read();
while(ch!=-1){
if((char)ch=='\n'){
System.out.println("\n"+ch);
}
System.out.println((char)ch);
ch=inputs.read(); }
}
catch(FileNotFoundException e){
System.out.print(e);
}
catch(IOException e){
System.out.println(e);
}
}}用条件判断,我运行的是回车的数值ch是10.单个读出字符,然后判断。
也不错,我要说的的是,我的文档是纯文本文档,没有图片和表格.
另外如果用你的想法,从如果判断上一航为满,如何判断整页为满??
我用的是pdfbox,先把pdf文档解析出来,然后再作判断,只是不知道能怎么判断每行,和每页的容量.请明示,谢谢啦
int length = aLine.getBytes("ISO-8859-1").length;上面的length为宽度(以单个英文字符宽作为单位),使用等宽字体的时候就会奏效。
问题是,上面您得到的是某一行文字的宽度(我解析的文档没有汉字)。而且pdfbox是整个文章解析,不时逐行解析。
1。我要得到某一行文字的宽度 a,2。得到文档的最大行宽 b ,
这样我就可以通过上面连个参数判断,回车换行是作者手动添加的(如果a<b),还是自动加进去的(a=b 或者a越等于b)。
问题是我不知道得到文档最大行宽,这是作者设置的,要不然,我只有把整个文档解析读一遍,自己找最大行宽,这样,真的太慢了,尤其是pdf文档很大的时候。
谢谢你的帮助
Keynotes
1
Enabling Real Time Data Analysis.
Divesh Srivastava (AT&T Labs Research,
United States of America), Lukasz Golab
(AT&T Labs Research, United States of America),
Rick Greer (AT&T Labs Research, United
States of America), Theodore Johnson (AT&T
Labs Research, United States of America),
Joseph Seidel (AT&T Labs Research, United
States of America), Vladislav Shkapenyuk
(AT&T Labs Research, United States of America),
Oliver Spatscheck (AT&T Labs Research,
United States of America), Jennifer Yates
(AT&T Labs Research, United States of America).
3
High-End Biological Imaging Generates Very
Large 3D+ and Dynamic Datasets
Paul Matsudaira (National University of Singapore,
Republic of Singapore).
10-Year Best Paper Awards
Keynote Sessions
4
Dealing with Web Data: History and Look ahead
Junghoo Cho (University of California Los Angeles,
United States of America), Hector Garcia-
Molina (Stanford University, United States of
America).
5
Database Replication: a Tale of Research across
Communities
Bettima Kemme (McGill University, Canada),
Gustavo Alonso (Eidgenössische Technische
Hochschule Zürich, Switzerland).
Research Sessions
Database Security
13
Building Disclosure Risk Aware Query Optimizers
for Relational Databases
Mustafa Canim (University of Texas at Dallas,
United States of America), Murat Kantarcioglu
(University of Texas at Dallas, United
States of America), Bijit Hore (University
of California Irvine, United States of America),
Sharad Mehrotra (University of California
Irvine, United States of America).
25
Secure Personal Data Servers: a Vision Paper
Tristan Allard (University of Versailles,
France), Nicolas Anciaux (Institut National
de Recherche en Informatique et Automatique,
France), Luc Bouganim (Institut National
de Recherche en Informatique et Automatique,
France), Yanli Guo (Institut National de
Recherche en Informatique et Automatique,
France), Lionel Le Folgoc (Institut National
de Recherche en Informatique et Automatique,
France), Benjamin Nguyen (Institut National
de Recherche en Informatique et Automatique,
France), Philippe Pucheral (Institut National
xxix
de Recherche en Informatique et Automatique,
France), Indrajit Ray (Colorado State University,
United States of America), Indrakshi Ray
(Colorado State University, United States of
America), Shaoyi Yin (Institut National de
Recherche en Informatique et Automatique,
France).
36
PolicyReplay: Misconfiguration-Response
Queries for Data Breach Reporting
Daniel Fabbri (University of Michigan, United
States of America), Kristen LeFevre (University
of Michigan, United States of America), Qiang
Zhu (University of Michigan, United States of
America).
Parallel and Distributed Databases
48
Schism: a Workload-Driven Approach to
Database Replication and Partitioning
Carlo Curino (Massachusetts Institute of Technology,
United States of America), Yang Zhang
(Massachusetts Institute of Technology, United
States of America), Evan Jones (Massachusetts
Institute of Technology, United States of America),
Samuel Madden (Massachusetts Institute
of Technology, United States of America).
58
Ten Thousand SQLs: Parallel Keyword Queries
Computing
Lu Qin (The Chinese University of Hong Kong,
People’s Republic of China), Jefferey Yu (The
Chinese University of Hong Kong, People’s Republic
of China), Lijun Chang (The Chinese
University of Hong Kong, People’s Republic of
China).
70
The Case for Determinism in Database Systems
Alexander Thomson (Yale University, United
States of America), Daniel Abadi (Yale University,
United States of America).
Data Exchange
81
MapMerge: Correlating Independent Schema
Mappings
Bogdan Alexe (University of California Santa
Cruz, United States of America), Mauricio
Hernández (IBM Research, United States of
America), Lucian Popa (IBM Almaden Research
Center, United States of America),
Wang-Chiew Tan (University of California
Santa Cruz, United States of America).
93
Chase Termination: A Constraints Rewriting
Approach
Francesca Spezzano (Università della Calabria,
Italy), Sergio Greco (Università della Calabria,
Italy).
105
Scalable Data Exchange with Functional Dependencies
Bruno Marnette (University of Oxford, United
Kingdom), Giansalvatore Mecca (Università
della Basilicata, Italy), Paolo Papotti (Università
Roma Tre, Italy).
Database Services and Applications
117
Interactive Route Search in the Presence of Order
Constraints
Roy Levin (Technion-Israel Institute of Technology,
Israel), Yaron Kanza (Technion-
Israel Institute of Technology, Israel), Eliyahu
Safra, Yehoshua Sagiv (Hebrew University of
Jerusalem, Israel).
129
Energy Management for MapReduce Clusters
Willis Lang (University of Wisconsin-Madison,
xxx
不过还是谢谢大家,特别感谢 magong我开始散分吧