Difference between revisions of "Reencoding MediaWiki pages"

From the Linux and Unix Users Group at Virginia Teck Wiki
Jump to: navigation, search
imported>Cov
imported>Cov
Line 1: Line 1:
 
The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page.
 
The following script is a work in progress. The end goal is to produce [[w:Wikitext]] from the HTML of a downloaded [[w:MediaWiki]] page.
  
 +
=Script=
 
<pre>
 
<pre>
 
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
 
sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
sed -r -e '/<!--/d' -e 's%</?p>%%g' \
+
sed -r -e '/<!--/d' \
-e 's%<br>%<br />%' \
+
-e 's|</?p>||g' \
-e 's|<a href="([^"]*)" class="external text" title="[^"]*" rel="nofollow">([^<]*)</a>|[\1 \2]|'
+
-e 's|<br>|<br />|' \
 +
-e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'
 
</pre>
 
</pre>
 +
 +
=Todo=
 +
* Header tags to equal signs
 +
* Local links to article links

Revision as of 05:31, 9 November 2009

The following script is a work in progress. The end goal is to produce w:Wikitext from the HTML of a downloaded w:MediaWiki page.

Script

sed -rn -e '/<!-- start content -->/,/<!--/p' page.html | \
sed -r -e '/<!--/d' \
	-e 's|</?p>||g' \
	-e 's|<br>|<br />|' \
	-e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g'

Todo

  • Header tags to equal signs
  • Local links to article links