Open main menu

Linux and Unix Users Group at Virginia Teck Wiki β

Changes

Reencoding MediaWiki pages

1,131 bytes added, 08:27, 3 January 2019
no edit summary
The following script is will convert HTML from a work in progress. The end goal is to produce [[w:WikitextMediaWiki|MediaWiki]] from the HTML of a downloaded page to [[w:MediaWikiWikitext|Wikitext]] page. This is going to help importing old wiki contents. Thanks The script was written to facilitate the Google cache, the HTML output of every page of the old site could be saved, but it looks like the MediaWiki database tables were overlooked when backing up before the server switch2009 VTLUUG servers migration.
=Script=
<pre>
sed -rn -e '## CLEANUP ### Comments/<!-- start content -->/d# Table of contents/<table id="toc/,/<!--\/table>/ d# Paragraph tagss|</?p>||g# Anchor tagss|<a name="[^"]*"></a>||g# Make breaks XHTMLs|<br>|<br />|g# Quotation markss|’|' page|gs|“|"|gs|”|"|g ## WIKIFY ### Italics and bolds|</?i>|''|gs|</?b>|'''|g# Headingss|<h1>.*>(.html *)</span></h1>| =\1=|gsed -r -e 's|<h2>.*>(.*)</span><!--/d' h2>|==\1==|g -e 's|<h3>.*>(.*)</?pspan></h3>|===\1===|g' s|<h4>.*>(.*)</span></h4>|====\1====|g# Internal links -e 's|<bra href="http://vtluug.org/wiki/[^>]*>|([^<]*)<br /a>|' [[\1]]|g# External links -e 's|<a href="([^"]*)"[^>]*>([^<]*)</a>|[\1 \2]|g</pre> =Running===One-Shot==Replace <code>input.html</code> and <code>out.wikitext</code> with appropriate filenames.<pre>sed -rn -e '/<!-- start content -->/,/<!--/p' input.html | sed -r -f script > out.wikitext
</pre>
=Todo=Batch==The following command will create .wikitext files of all the HTML files in the current directory for your cut and paste convenience.<pre>for f in * Header tags to equal signs.html ; do $( sed -rn -e '/<!-- start content -->/,/<!--/p' "$f" | sed -r -f script > "$f.wikitext" ) ; done</pre> =Copying=Once the .wikitext files are generated, you can simply open them up, edit them by hand if necessary, and copy and paste them into MediaWiki. Noting that this is an import in the summary box is recommended.<pre>gedit * Local .wikitext</pre> =Effectiveness=The script was effective enough for our purposes when written, but it has some shortcomings. Images and local article links are handled poorly and it does not attempt to article linksproduce the brace-bar-dash table markup. [[Category:Scripts]]