FANDOM


Best way to screen-scrape a web site. 1. Use wget to get the html file. See examples of using wget:

2. Convert the html to xml using "tidy"

tidy -asxhtml -numeric <oldpage.html> newpage.xml

3. Use xpath / xslt to interpret the context of the xml, and recursively invoke wget again, depending on need.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.