March 21, 2008 at 3:51 pm
One of my first posts was a comparison of HTML parsers. Today I found a particularly challenging document to parse. None of the parsers I had compared earlier were able to handle the malformed HTML in this table where the td elements were prematurely ended. The behavior of Neko and HtmlCleaner made the most sense (while still failing to clean the document) while the output from TagSoup and jTidy was a bit more strange.
However, I noticed that FireBug parsed the document correctly. So I did a bit of research into how I’d be able to use Firefox’s HTML parsing and found a project called Mozilla Parser that had been put together to do just that. Its setup is not quite as nice as the others, but is well documented. Follow the quick start to begin with. Then when you get to the portion where you write actual Java code you may want to follow the example below as it appears the API has been updated since the documentation was posted.
final String BASE_PATH = "C:\\Documents and Settings\\bjm733\\My Documents\\workspace\\MozillaHtmlParser\\";
try {
File parserLibraryFile = new File(BASE_PATH + "native" + File.separator + "bin" + File.separator + "MozillaParser" + EnviromentController.getSharedLibraryExtension());
String parseLibrary = parserLibraryFile.getAbsolutePath();
MozillaParser.init(parseLibrary, BASE_PATH + "mozilla.dist.bin."+EnviromentController.getOperatingSystemName());
MozillaParser parser = new MozillaParser();
document = parser.parse("<html><body>hello world</body></html>");
} catch(Exception e) {
e.printStackTrace();
}
The most unfortunate thing about this approach is that it is not pure Java, which can be a deal breaker in many situations. Also it’s not well maintained with responsive developers.
Permalink
March 15, 2008 at 10:17 am
Wuala is a p2p backup and remote storage service. It gives you 1GB of backup space for free and more when you share disk space with others. If you have a large amount of data or multimedia to backup then this is a much cheaper alternative to services such as Mozy. All your data is encrypted and replicated multiple times. There is a very interesting Google tech talk on the subject which shares some insights into the workings of Wuala. I found the portion dealing with erasure codes to be particularly interesting. Contact me if you’d like an invite to Wuala.
Also, you may have noticed the pace of blogging my has slowed. I’ve been extremely busy recently and expect this to continue for the next month or two. Nonetheless, I am going to make time to post a series of Struts 2 tutorials based off a presentation I gave a few months ago.
Permalink
March 10, 2008 at 9:33 am
This should be in the Hibernate documentation, but I don’t believe it is. Instead you have to download the binary distribution and open the readme in the lib directory. Since I frequently find myself downloading the entire archive just to view the readme, I am reposting the relevant sections here.
ehcache-1.2.3.jar (1.2.3)
- EHCache cache
- runtime, optional (required if no other cache provider is set)
jta.jar (unknown)
- Standard JTA API
- runtime, required for standalone operation (outside application server)
xml-apis.jar (unknown)
- Standard JAXP API
- runtime, some SAX parser is required
commons-logging-1.0.4.jar (1.0.4)
- Commons Logging
- runtime, required
asm-attrs.jar (unknown)
- ASM bytecode library
- runtime, required if using ‘cglib’ bytecode provider
dom4j-1.6.1.jar (1.6.1)
- XML configuration & mapping parser
- runtime, required
antlr-2.7.6.jar (2.7.6)
- ANother Tool for Language Recognition
- runtime, required
cglib-2.1.3.jar (2.1.3)
- CGLIB bytecode generator
- runtime, required if using ‘cglib’ bytecode provider
asm.jar (unknown)
- ASM bytecode library
- runtime, required if using ‘cglib’ bytecode provider
commons-collections-2.1.1.jar (2.1.1)
- Commons Collections
- runtime, required
Permalink
March 4, 2008 at 7:59 pm
A client sent me some code today to update. He was using the NetBeans, so I downloaded the IDE and fired it up to open the project he’d sent me. Unfortunately, the project wouldn’t compile because he’d written the code in Java 6 while NetBeans was using Java 5. I couldn’t find a NetBeans menu to update the setting, but rather found that the fix is to add the following in NetBean’s etc/netbeans.conf file:
# Default location of JDK, can be overridden by using –jdkhome <dir>:
netbeans_jdkhome=”C:\Program Files\Java\jdk1.6.0_05″
Permalink