Difference between revisions of "Reencoding MediaWiki pages"
imported>Cov |
imported>Cov |
||
Line 1: | Line 1: | ||
− | The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page. | + | The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page. This is going to help importing old wiki contents. Thanks to the Google cache, the HTML output of every page of the old site could be saved, but it looks like the MediaWiki database tables were overlooked when backing up before the server switch. |
=Script= | =Script= |
Revision as of 05:32, 9 November 2009
The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page. This is going to help importing old wiki contents. Thanks to the Google cache, the HTML output of every page of the old site could be saved, but it looks like the MediaWiki database tables were overlooked when backing up before the server switch.
Script
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \ sed -r -e '/<!--/d' \ -e 's|</?p>||g' \ -e 's|<br>|<br />|' \ -e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'
Todo
- Header tags to equal signs
- Local links to article links