Open main menu

Linux and Unix Users Group at Virginia Teck Wiki β

Changes

Reencoding MediaWiki pages

1,415 bytes added, 08:27, 3 January 2019
no edit summary
The following script is will convert HTML from a work in progress. The end goal is to produce [[w:WikiMarkupMediaWiki|MediaWiki]] from the HTML of a downloaded page to [[w:MediaWikiWikitext|Wikitext]] page. The script was written to facilitate the 2009 VTLUUG servers migration.
=Script=
<pre>
sed -rn -e '## CLEANUP ### Comments/<!-- start content -->/d# Table of contents/<table id="toc/,/<!--\/table>/ d# Paragraph tagss|</?p>||g# Anchor tagss|<a name="[^"]*"></a>||g# Make breaks XHTMLs|<br>|<br />|g# Quotation markss|’|' page.html | \gs|“|"|gs|”|"|g ## WIKIFY ### Italics and boldsed -r -e s|</?i>|''/|gs|<!--/d?b>|'' -e '|g# Headingss%|<h1>.*>(.*)</span></h1>|=\1=|gs|<h2>.*>(.*)</span></?ph2>%%|==\1==|g' s|<h3>.*>(.*)</span></h3>|===\1===|g -e 's%|<brh4>.*>(.*)</span>%<br /h4>%' |====\1====|g# Internal links -e 's|<a href="http://vtluug.org/wiki/[^>]*>([^"<]*)" class</a>|[[\1]]|g# External linkss|<a href="external text([^" title=]*)"[^">]*" rel="nofollow">([^<]*)</a>|[\1 \2]|g</pre> =Running===One-Shot==Replace <code>input.html</code> and <code>out.wikitext</code> with appropriate filenames.<pre>sed -rn -e '/<!-- start content -->/,/<!--/p' input.html | sed -r -f script > out.wikitext</pre> ==Batch==The following command will create .wikitext files of all the HTML files in the current directory for your cut and paste convenience.<pre>for f in *.html ; do $( sed -rn -e '/<!-- start content -->/,/<!--/p' "$f" | sed -r -f script > "$f.wikitext" ) ; done</pre> =Copying=Once the .wikitext files are generated, you can simply open them up, edit them by hand if necessary, and copy and paste them into MediaWiki. Noting that this is an import in the summary box is recommended.<pre>gedit *.wikitext
</pre>
 
=Effectiveness=
The script was effective enough for our purposes when written, but it has some shortcomings. Images and local article links are handled poorly and it does not attempt to produce the brace-bar-dash table markup.
 
[[Category:Scripts]]