Quantcast

Ben McCann

Co-founder at Connectifier.
ex-Googler. CMU alum.

AngelList Twitter LinkedIn Google+

Scraping

HTML Parsing using the Firefox DLLs

03/21/2008

One of my first posts was a comparison of HTML parsers. Today I found a particularly challenging document to parse. None of the parsers I had compared earlier were able to handle the malformed HTML in this table where the td elements were prematurely ended. The behavior of Neko and HtmlCleaner made the most sense Read More