Reencoding MediaWiki pages

From the Linux and Unix Users Group at Virginia Teck Wiki
Revision as of 05:31, 9 November 2009 by imported>Cov
Jump to: navigation, search

The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page.

Script

sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
sed -r -e '/<!--/d' \
	-e 's|</?p>||g' \
	-e 's|<br>|<br />|' \
	-e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'

Todo

  • Header tags to equal signs
  • Local links to article links