Open main menu

Linux and Unix Users Group at Virginia Teck Wiki β

Reencoding MediaWiki pages

Revision as of 03:07, 9 November 2009 by imported>Cov

The following script is a work in progress. The end goal is to produce w:WikiMarkup from the HTML of a downloaded w:MediaWiki page.

sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
sed -r -e '/<!--/d' -e 's%</?p>%%g' \
	-e 's%<br>%<br />%' \
	-e 's|<a href="([^"]*)" class="external text" title="[^"]*" rel="nofollow">([^<]*)</a>|[\1 \2]|'