Reencoding MediaWiki pages

From the Linux and Unix Users Group at Virginia Teck Wiki
Revision as of 03:08, 9 November 2009 by imported>Cov
Jump to: navigation, search

The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page.

sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
sed -r -e '/<!--/d' -e 's%</?p>%%g' \
	-e 's%<br>%<br />%' \
	-e 's|<a href="([^"]*)" class="external text" title="[^"]*" rel="nofollow">([^<]*)</a>|[\1 \2]|'