Reencoding MediaWiki pages
Revision as of 05:31, 9 November 2009 by imported>Cov
The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page.
Script
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \ sed -r -e '/<!--/d' \ -e 's|</?p>||g' \ -e 's|<br>|<br />|' \ -e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'
Todo
- Header tags to equal signs
- Local links to article links