如何使用Jsoup从html中提取段落文本?
发布时间:2020-05-24 01:32:05 所属栏目:Java 来源:互联网
导读:import java.io.IOException;import java.util.logging.Level;import java.util.logging.Logger;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.s
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JavaApplication14 {
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("tanmoy_mahathir.makes.org/thimble/146").get();
String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."
+ "</p></body></html>";
Elements paragraphs = doc.select("p");
for(Element p : paragraphs)
System.out.println(p.text());
} catch (IOException ex) {
Logger.getLogger(JavaApplication14.class.getName()).log(Level.SEVERE,null,ex);
}
}
} 任何人都可以帮我解决jsoup代码如何解析包括段落的部分,以便只打印 Hello,World! Nothing is impossible 解决方法对于这一小部分html你只需要做String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."+
+"</p></body></html>";
Document doc = Jsoup.parse(html);
Elements paragraphs = doc.select("p");
for(Element p : paragraphs)
System.out.println(p.text());
正如我所看到你的链接包含几乎相同的html,你也可以用doc替换doc的定义 Document doc = Jsoup.connect("https://tanmoy_mahathir.makes.org/thimble/146").get();
UPDATE 这是完整的代码编译并运行正常. import java.io.IOException;
import java.util.logging.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class JavaApplication14 {
public static void main(String[] args) {
try {
String url = "https://tanmoy_mahathir.makes.org/thimble/146";
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select("p");
for(Element p : paragraphs)
System.out.println(p.text());
}
catch (IOException ex) {
Logger.getLogger(JavaApplication14.class.getName())
.log(Level.SEVERE,ex);
}
}
} (编辑:安卓应用网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
