Reencoding MediaWiki pages
Revision as of 05:32, 9 November 2009 by imported>Cov
The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page. This is going to help importing old wiki contents. Thanks to the Google cache, the HTML output of every page of the old site could be saved, but it looks like the MediaWiki database tables were overlooked when backing up before the server switch.
Script
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \ sed -r -e '/<!--/d' \ -e 's|</?p>||g' \ -e 's|<br>|<br />|' \ -e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'
Todo
- Header tags to equal signs
- Local links to article links