Difference between revisions of "Reencoding MediaWiki pages"
imported>Cov |
imported>Cov |
||
| Line 1: | Line 1: | ||
The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page. | The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page. | ||
| + | =Script= | ||
<pre> | <pre> | ||
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \ | sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \ | ||
| − | sed -r -e '/<!--/d' -e 's | + | sed -r -e '/<!--/d' \ |
| − | -e 's | + | -e 's|</?p>||g' \ |
| − | -e 's|<a href="([^"]*) | + | -e 's|<br>|<br />|' \ |
| + | -e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g' | ||
</pre> | </pre> | ||
| + | |||
| + | =Todo= | ||
| + | * Header tags to equal signs | ||
| + | * Local links to article links | ||
Revision as of 05:31, 9 November 2009
The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page.
Script
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \ sed -r -e '/<!--/d' \ -e 's|</?p>||g' \ -e 's|<br>|<br />|' \ -e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'
Todo
- Header tags to equal signs
- Local links to article links