Difference between revisions of "Reencoding MediaWiki pages"

From the Linux and Unix Users Group at Virginia Teck Wiki
Jump to: navigation, search
imported>Cov
imported>Cov
Line 1: Line 1:
The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page.
+
The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page. This is going to help importing old wiki contents. Thanks to the Google cache, the HTML output of every page of the old site could be saved, but it looks like the MediaWiki database tables were overlooked when backing up before the server switch.
  
 
=Script=
 
=Script=

Revision as of 05:32, 9 November 2009

The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page. This is going to help importing old wiki contents. Thanks to the Google cache, the HTML output of every page of the old site could be saved, but it looks like the MediaWiki database tables were overlooked when backing up before the server switch.

Script

sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
sed -r -e '/<!--/d' \
	-e 's|</?p>||g' \
	-e 's|<br>|<br />|' \
	-e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'

Todo

  • Header tags to equal signs
  • Local links to article links