<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: HTML Parsing using the Firefox DLLs</title>
	<atom:link href="http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/</link>
	<description>The software development weblog of Benjamin McCann.</description>
	<lastBuildDate>Mon, 06 Feb 2012 21:26:21 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: prabakaran</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-32533</link>
		<dc:creator>prabakaran</dc:creator>
		<pubDate>Thu, 11 Nov 2010 08:11:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-32533</guid>
		<description>yep antony i had a chat with cnt reg the parsing.. now i am able to parse the malformed html tags without any probs.. thanks...</description>
		<content:encoded><![CDATA[<p>yep antony i had a chat with cnt reg the parsing.. now i am able to parse the malformed html tags without any probs.. thanks&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anton</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-19846</link>
		<dc:creator>Anton</dc:creator>
		<pubDate>Wed, 27 Jan 2010 20:46:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-19846</guid>
		<description>Also, i tried parsing your test document with org.htmlparser and it seems to have parsed it okay, even with the weird  tags.</description>
		<content:encoded><![CDATA[<p>Also, i tried parsing your test document with org.htmlparser and it seems to have parsed it okay, even with the weird  tags.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Johan</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-13101</link>
		<dc:creator>Johan</dc:creator>
		<pubDate>Fri, 31 Jul 2009 08:56:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-13101</guid>
		<description>For the record, cobra (http://lobobrowser.org/cobra/java-html-parser.jsp) seems very promising. It offers a very helpful feature to extract all links from a page. Hence, given a html page, cobra downloads includes, stylesheets and external javascript automagically. After the parsing is done, a simple routine returns all links that were found. Unfortunately it had a serious flaw, it could not parse www.google.com. Somehow, when parsing javascript it fell into an eternal loop. This simple fact severely reduced the attractiveness of the parser.</description>
		<content:encoded><![CDATA[<p>For the record, cobra (<a href="http://lobobrowser.org/cobra/java-html-parser.jsp" rel="nofollow">http://lobobrowser.org/cobra/java-html-parser.jsp</a>) seems very promising. It offers a very helpful feature to extract all links from a page. Hence, given a html page, cobra downloads includes, stylesheets and external javascript automagically. After the parsing is done, a simple routine returns all links that were found. Unfortunately it had a serious flaw, it could not parse <a href="http://www.google.com" rel="nofollow">http://www.google.com</a>. Somehow, when parsing javascript it fell into an eternal loop. This simple fact severely reduced the attractiveness of the parser.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Johan</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-13099</link>
		<dc:creator>Johan</dc:creator>
		<pubDate>Fri, 31 Jul 2009 07:59:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-13099</guid>
		<description>I&#039;ve had it working with the following setup (Windows only)

Append the following two directories to the PATH variable (properly prefixed, e.g., C:/)
MozillaParser-v-0-3-0\dist\windows\mozilla\components
MozillaParser-v-0-3-0\dist\windows\mozilla

Set the following two variables: 
// From archive: http://sourceforge.net/projects/mozillaparser/files/mozillaparser/MozillaParser-v-0-3-0/MozillaParser-v-0-3-0.zip/download
String parserLibrary = &quot;C:\\MozillaParser-v-0-3-0\\dist\\windows\\MozillaParser.dll&quot;;

// From archive: http://sourceforge.net/projects/mozillaparser/files/mozillaparser/Mozilla%20Components%20base%20v.0.1/mozilla-dist-bin-windows.zip/download
String mozillaBin = &quot;C:\\bin&quot;

Finally:
MozillaParser.init(parseLib, mozillaBin);



As a side node, the author enters the above string objects into a file object, and then returns the path from the file object. This is probably a better approach, but not necessary to get it all working.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve had it working with the following setup (Windows only)</p>
<p>Append the following two directories to the PATH variable (properly prefixed, e.g., C:/)<br />
MozillaParser-v-0-3-0\dist\windows\mozilla\components<br />
MozillaParser-v-0-3-0\dist\windows\mozilla</p>
<p>Set the following two variables:<br />
// From archive: <a href="http://sourceforge.net/projects/mozillaparser/files/mozillaparser/MozillaParser-v-0-3-0/MozillaParser-v-0-3-0.zip/download" rel="nofollow">http://sourceforge.net/projects/mozillaparser/files/mozillaparser/MozillaParser-v-0-3-0/MozillaParser-v-0-3-0.zip/download</a><br />
String parserLibrary = &#8220;C:\\MozillaParser-v-0-3-0\\dist\\windows\\MozillaParser.dll&#8221;;</p>
<p>// From archive: <a href="http://sourceforge.net/projects/mozillaparser/files/mozillaparser/Mozilla%20Components%20base%20v.0.1/mozilla-dist-bin-windows.zip/download" rel="nofollow">http://sourceforge.net/projects/mozillaparser/files/mozillaparser/Mozilla%20Components%20base%20v.0.1/mozilla-dist-bin-windows.zip/download</a><br />
String mozillaBin = &#8220;C:\\bin&#8221;</p>
<p>Finally:<br />
MozillaParser.init(parseLib, mozillaBin);</p>
<p>As a side node, the author enters the above string objects into a file object, and then returns the path from the file object. This is probably a better approach, but not necessary to get it all working.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: A. Shiraz</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-8751</link>
		<dc:creator>A. Shiraz</dc:creator>
		<pubDate>Mon, 20 Apr 2009 01:34:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-8751</guid>
		<description>Trying this out and get the same error as the others (dependencies). I tried putting the following directories in path : 

C:\Documents and Settings\Shiraz&gt;path
PATH=C:\Temp\set\MozillaParser-v-0-3-0\MozillaParser-v-0-3-0\dist\windows;C:\
Temp\set\MozillaParser-v-0-3-0\MozillaParser-v-0-3-0\dist\windows\components

I then tried the following 
                File parserLibraryFile = new File(&quot;C:/SET/lib/mparser/MozillaParser-v-0-3-0/dist/windows/MozillaParser&quot;
                        + EnviromentController.getSharedLibraryExtension());
                String parserLibrary = parserLibraryFile.getAbsolutePath();
                System.out.println(&quot;Loading Parser Library &quot; + parserLibrary);
                //	mozilla.dist.bin directory 
                final File mozillaDistBinDirectory = new File(
                        &quot;C:/SET/lib/mparser/MozillaParser-v-0-3-0/dist/&quot;
                                + &quot;windows&quot;);
                String absPath =mozillaDistBinDirectory.getAbsolutePath();
                MozillaParser.init(parserLibrary, absPath);
I still get the following error : 

Operating system : Windows XP
Loading Parser Library C:\SET\lib\mparser\MozillaParser-v-0-3-0\dist\windows\MozillaParser.dll
com.dappit.Dapper.parser.ParserInitializationException
	at com.dappit.Dapper.parser.MozillaParser.init(Unknown Source)
	at first.ParserExample.main(ParserExample.java:30)
Caused by: java.lang.UnsatisfiedLinkError: C:\SET\lib\mparser\MozillaParser-v-0-3-0\dist\windows\MozillaParser.dll: Can&#039;t find dependent libraries
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(Unknown Source)
	at java.lang.ClassLoader.loadLibrary(Unknown Source)
	at java.lang.Runtime.load0(Unknown Source)
	at java.lang.System.load(Unknown Source)
	... 2 more</description>
		<content:encoded><![CDATA[<p>Trying this out and get the same error as the others (dependencies). I tried putting the following directories in path : </p>
<p>C:\Documents and Settings\Shiraz&gt;path<br />
PATH=C:\Temp\set\MozillaParser-v-0-3-0\MozillaParser-v-0-3-0\dist\windows;C:\<br />
Temp\set\MozillaParser-v-0-3-0\MozillaParser-v-0-3-0\dist\windows\components</p>
<p>I then tried the following<br />
                File parserLibraryFile = new File(&#8220;C:/SET/lib/mparser/MozillaParser-v-0-3-0/dist/windows/MozillaParser&#8221;<br />
                        + EnviromentController.getSharedLibraryExtension());<br />
                String parserLibrary = parserLibraryFile.getAbsolutePath();<br />
                System.out.println(&#8220;Loading Parser Library &#8221; + parserLibrary);<br />
                //	mozilla.dist.bin directory<br />
                final File mozillaDistBinDirectory = new File(<br />
                        &#8220;C:/SET/lib/mparser/MozillaParser-v-0-3-0/dist/&#8221;<br />
                                + &#8220;windows&#8221;);<br />
                String absPath =mozillaDistBinDirectory.getAbsolutePath();<br />
                MozillaParser.init(parserLibrary, absPath);<br />
I still get the following error : </p>
<p>Operating system : Windows XP<br />
Loading Parser Library C:\SET\lib\mparser\MozillaParser-v-0-3-0\dist\windows\MozillaParser.dll<br />
com.dappit.Dapper.parser.ParserInitializationException<br />
	at com.dappit.Dapper.parser.MozillaParser.init(Unknown Source)<br />
	at first.ParserExample.main(ParserExample.java:30)<br />
Caused by: java.lang.UnsatisfiedLinkError: C:\SET\lib\mparser\MozillaParser-v-0-3-0\dist\windows\MozillaParser.dll: Can&#8217;t find dependent libraries<br />
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)<br />
	at java.lang.ClassLoader.loadLibrary0(Unknown Source)<br />
	at java.lang.ClassLoader.loadLibrary(Unknown Source)<br />
	at java.lang.Runtime.load0(Unknown Source)<br />
	at java.lang.System.load(Unknown Source)<br />
	&#8230; 2 more</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Audrey</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-5062</link>
		<dc:creator>Audrey</dc:creator>
		<pubDate>Thu, 12 Feb 2009 15:23:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-5062</guid>
		<description>Thanks for the comparison report. Looks like I am working on something similar a year later. I am thinking of trying Cobra HTML Parser http://lobobrowser.org/cobra.jsp because it is pure java and is CSS and javascript aware and looks like it is more actively maintained than HTMLParser.</description>
		<content:encoded><![CDATA[<p>Thanks for the comparison report. Looks like I am working on something similar a year later. I am thinking of trying Cobra HTML Parser <a href="http://lobobrowser.org/cobra.jsp" rel="nofollow">http://lobobrowser.org/cobra.jsp</a> because it is pure java and is CSS and javascript aware and looks like it is more actively maintained than HTMLParser.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Towell</title>
		<link>http://www.benmccann.com/dev-blog/html-parsing-with-java-mozilla-html-parser/comment-page-1/#comment-749</link>
		<dc:creator>John Towell</dc:creator>
		<pubDate>Sat, 13 Sep 2008 07:49:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.lumidant.com/blog/html-parsing-with-java-mozilla-html-parser/#comment-749</guid>
		<description>I can&#039;t get paste the following exception.  I have checked and double checked my path many times.  Any ideas?

com.dappit.Dapper.parser.ParserInitializationException
	at com.dappit.Dapper.parser.MozillaParser.init(Unknown Source)
	at com.fantasytruth.accuracy.ParserTest.testParser(ParserTest.java:21)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at junit.framework.TestCase.runTest(TestCase.java:154)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
Caused by: java.lang.UnsatisfiedLinkError: C:\dev-tools\MozillaHtmlParser\native\bin\MozillaParser.dll: Can&#039;t find dependent libraries
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(Unknown Source)
	at java.lang.ClassLoader.loadLibrary(Unknown Source)
	at java.lang.Runtime.load0(Unknown Source)
	at java.lang.System.load(Unknown Source)
	... 18 more</description>
		<content:encoded><![CDATA[<p>I can&#8217;t get paste the following exception.  I have checked and double checked my path many times.  Any ideas?</p>
<p>com.dappit.Dapper.parser.ParserInitializationException<br />
	at com.dappit.Dapper.parser.MozillaParser.init(Unknown Source)<br />
	at com.fantasytruth.accuracy.ParserTest.testParser(ParserTest.java:21)<br />
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)<br />
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)<br />
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)<br />
	at java.lang.reflect.Method.invoke(Unknown Source)<br />
	at junit.framework.TestCase.runTest(TestCase.java:154)<br />
	at junit.framework.TestCase.runBare(TestCase.java:127)<br />
	at junit.framework.TestResult$1.protect(TestResult.java:106)<br />
	at junit.framework.TestResult.runProtected(TestResult.java:124)<br />
	at junit.framework.TestResult.run(TestResult.java:109)<br />
	at junit.framework.TestCase.run(TestCase.java:118)<br />
	at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)<br />
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)<br />
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)<br />
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)<br />
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)<br />
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)<br />
Caused by: java.lang.UnsatisfiedLinkError: C:\dev-tools\MozillaHtmlParser\native\bin\MozillaParser.dll: Can&#8217;t find dependent libraries<br />
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)<br />
	at java.lang.ClassLoader.loadLibrary0(Unknown Source)<br />
	at java.lang.ClassLoader.loadLibrary(Unknown Source)<br />
	at java.lang.Runtime.load0(Unknown Source)<br />
	at java.lang.System.load(Unknown Source)<br />
	&#8230; 18 more</p>
]]></content:encoded>
	</item>
</channel>
</rss>

