Difference between revisions of "Reencoding MediaWiki pages"
imported>Cov |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | The following script will convert HTML from a [[w:MediaWiki|MediaWiki]] page to [[w:Wikitext|Wikitext]]. The script was written to facilitate the 2009 | + | The following script will convert HTML from a [[w:MediaWiki|MediaWiki]] page to [[w:Wikitext|Wikitext]]. The script was written to facilitate the 2009 VTLUUG servers migration. |
=Script= | =Script= | ||
Line 35: | Line 35: | ||
=Running= | =Running= | ||
− | + | ==One-Shot== | |
+ | Replace <code>input.html</code> and <code>out.wikitext</code> with appropriate filenames. | ||
<pre> | <pre> | ||
− | for f in *.html ; do $( sed -rn -e '/<!-- start content -->/,/<!--/p' $f | sed -r -f script > $f.wikitext ) ; done | + | sed -rn -e '/<!-- start content -->/,/<!--/p' input.html | sed -r -f script > out.wikitext |
+ | </pre> | ||
+ | |||
+ | ==Batch== | ||
+ | The following command will create .wikitext files of all the HTML files in the current directory for your cut and paste convenience. | ||
+ | <pre> | ||
+ | for f in *.html ; do $( sed -rn -e '/<!-- start content -->/,/<!--/p' "$f" | sed -r -f script > "$f.wikitext" ) ; done | ||
</pre> | </pre> | ||
Latest revision as of 08:27, 3 January 2019
The following script will convert HTML from a MediaWiki page to Wikitext. The script was written to facilitate the 2009 VTLUUG servers migration.
Script
## CLEANUP ## # Comments /<!--/d # Table of contents /<table id="toc/,/<\/table>/ d # Paragraph tags s|</?p>||g # Anchor tags s|<a name="[^"]*"></a>||g # Make breaks XHTML s|<br>|<br />|g # Quotation marks s|’|'|g s|“|"|g s|”|"|g ## WIKIFY ## # Italics and bold s|</?i>|''|g s|</?b>|'''|g # Headings s|<h1>.*>(.*)</span></h1>|=\1=|g s|<h2>.*>(.*)</span></h2>|==\1==|g s|<h3>.*>(.*)</span></h3>|===\1===|g s|<h4>.*>(.*)</span></h4>|====\1====|g # Internal links s|<a href="http://vtluug.org/wiki/[^>]*>([^<]*)</a>|[[\1]]|g # External links s|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g
Running
One-Shot
Replace input.html
and out.wikitext
with appropriate filenames.
sed -rn -e '/<!-- start content -->/,/<!--/p' input.html | sed -r -f script > out.wikitext
Batch
The following command will create .wikitext files of all the HTML files in the current directory for your cut and paste convenience.
for f in *.html ; do $( sed -rn -e '/<!-- start content -->/,/<!--/p' "$f" | sed -r -f script > "$f.wikitext" ) ; done
Copying
Once the .wikitext files are generated, you can simply open them up, edit them by hand if necessary, and copy and paste them into MediaWiki. Noting that this is an import in the summary box is recommended.
gedit *.wikitext
Effectiveness
The script was effective enough for our purposes when written, but it has some shortcomings. Images and local article links are handled poorly and it does not attempt to produce the brace-bar-dash table markup.