<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Showdown &#8211; Java HTML Parsing Comparison</title>
	<atom:link href="http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/</link>
	<description>The software development weblog of Benjamin McCann.</description>
	<lastBuildDate>Mon, 06 Feb 2012 21:26:21 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Victor</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-47466</link>
		<dc:creator>Victor</dc:creator>
		<pubDate>Sat, 25 Jun 2011 08:13:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-47466</guid>
		<description>Thanks for this excellent article. After 3 years of the initial comparison, the results are still useful to choose the appropriate html parser for our needs. I will use HTMLCleaner by the way...</description>
		<content:encoded><![CDATA[<p>Thanks for this excellent article. After 3 years of the initial comparison, the results are still useful to choose the appropriate html parser for our needs. I will use HTMLCleaner by the way&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: marcelo camanho</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-44500</link>
		<dc:creator>marcelo camanho</dc:creator>
		<pubDate>Sat, 23 Apr 2011 00:34:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-44500</guid>
		<description>utkarsh, you can use the following:

		CleanerProperties props = new CleanerProperties();			 
                //not sure if you will need it, but i needed it..
		props.setNamespacesAware(false);

		DomSerializer dom = new DomSerializer(props);
		Document doc = dom.createDOM(clean);</description>
		<content:encoded><![CDATA[<p>utkarsh, you can use the following:</p>
<p>		CleanerProperties props = new CleanerProperties();<br />
                //not sure if you will need it, but i needed it..<br />
		props.setNamespacesAware(false);</p>
<p>		DomSerializer dom = new DomSerializer(props);<br />
		Document doc = dom.createDOM(clean);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: utkarsh</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-43722</link>
		<dc:creator>utkarsh</dc:creator>
		<pubDate>Thu, 07 Apr 2011 05:05:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-43722</guid>
		<description>Hi Ben in your code for Html Cleaner you are using 
document=cleaner.createDOM();
However this function is not present in HTMLCleaner class
So, please help me.
Thanks</description>
		<content:encoded><![CDATA[<p>Hi Ben in your code for Html Cleaner you are using<br />
document=cleaner.createDOM();<br />
However this function is not present in HTMLCleaner class<br />
So, please help me.<br />
Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: utkarsh</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-42936</link>
		<dc:creator>utkarsh</dc:creator>
		<pubDate>Tue, 22 Mar 2011 06:59:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-42936</guid>
		<description>thanks Ben</description>
		<content:encoded><![CDATA[<p>thanks Ben</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-42891</link>
		<dc:creator>Ben</dc:creator>
		<pubDate>Mon, 21 Mar 2011 17:52:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-42891</guid>
		<description>Hi Utkarsh,
urlIs is a java.io.InputStream.  One way of getting an InputStream is to call URL.openStream().  If you&#039;re reading from files on disk you&#039;ll probably want to use a java.io.FileInputStream.</description>
		<content:encoded><![CDATA[<p>Hi Utkarsh,<br />
urlIs is a java.io.InputStream.  One way of getting an InputStream is to call URL.openStream().  If you&#8217;re reading from files on disk you&#8217;ll probably want to use a java.io.FileInputStream.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: utkarsh</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-42883</link>
		<dc:creator>utkarsh</dc:creator>
		<pubDate>Mon, 21 Mar 2011 11:59:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-42883</guid>
		<description>What is urlIS here ? which class&#039;s object is it.
actually i am trying to parse an html page stored on disk .
how to supply the FileReader object to HTMLCleaner Constructor</description>
		<content:encoded><![CDATA[<p>What is urlIS here ? which class&#8217;s object is it.<br />
actually i am trying to parse an html page stored on disk .<br />
how to supply the FileReader object to HTMLCleaner Constructor</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mario Gaitán</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-41413</link>
		<dc:creator>Mario Gaitán</dc:creator>
		<pubDate>Wed, 02 Mar 2011 04:39:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-41413</guid>
		<description>Thanks for the post, great comparison!!!</description>
		<content:encoded><![CDATA[<p>Thanks for the post, great comparison!!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-37021</link>
		<dc:creator>David</dc:creator>
		<pubDate>Mon, 17 Jan 2011 06:52:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-37021</guid>
		<description>Thanks to all for your posts and your time !!!
I really appreciate it !</description>
		<content:encoded><![CDATA[<p>Thanks to all for your posts and your time !!!<br />
I really appreciate it !</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ankur</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-34027</link>
		<dc:creator>Ankur</dc:creator>
		<pubDate>Fri, 10 Dec 2010 20:37:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-34027</guid>
		<description>Hi,

Can someone please help me with the following problem that I&#039;m facing ? 

I am parsing certain html pages.  For a best effort , I&#039;m using Neko, and if Neko fails, my code will switch to JTidy. After parsing,  I use xpath to extract some information in the page. The problem is that Neko prefixes &quot;xhtml&quot; before every element in the parsed DOM.  So I have to specify this prefix in the Xpath also  (Eg, //xhtml:a/@href )  Because of this problem, I&#039;m not able to use a common xpath for extraction,  while not worrying about which parser created the DOM  ( Neko or JTidy),  Please help.

Thanks,
Ankur.</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>Can someone please help me with the following problem that I&#8217;m facing ? </p>
<p>I am parsing certain html pages.  For a best effort , I&#8217;m using Neko, and if Neko fails, my code will switch to JTidy. After parsing,  I use xpath to extract some information in the page. The problem is that Neko prefixes &#8220;xhtml&#8221; before every element in the parsed DOM.  So I have to specify this prefix in the Xpath also  (Eg, //xhtml:a/@href )  Because of this problem, I&#8217;m not able to use a common xpath for extraction,  while not worrying about which parser created the DOM  ( Neko or JTidy),  Please help.</p>
<p>Thanks,<br />
Ankur.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Compare Excel Sheets</title>
		<link>http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/comment-page-1/#comment-28603</link>
		<dc:creator>Compare Excel Sheets</dc:creator>
		<pubDate>Fri, 13 Aug 2010 08:58:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/java-html-parsing-library-comparison/#comment-28603</guid>
		<description>very nice resource for web developers. Thanks dude for valuable information.</description>
		<content:encoded><![CDATA[<p>very nice resource for web developers. Thanks dude for valuable information.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

