We decided to move from Atlassian's Confluence wiki software to MediaWiki, in the hopes that a more familiar wiki system will encourage participation. To start, I exported from Confluence, got a zip file with entities.xml in it. Attached is a script to create a text file per page.
#!/usr/bin/env python from cElementTree import iterparse from cStringIO import StringIO import codecs for event, elem in iterparse(file("entities.xml")): if elem.tag == "object" and elem.get('class') == 'Page': save = True title = None content = None children = elem.getchildren() id = elem.find('id') for child in children: if child.tag == "property" and child.get('name') == "title": title = child.text if child.tag == "property" and child.get('name') == "content": content = child.text if child.tag == "property" and child.get('name') == "originalVersion": save = False orig_id = child.getchildren()[0] if not save: continue print "Will save page with title '%s'" % (title,) if not content: print "... but has no contents" continue f = codecs.open('pages/%s' % (title,), 'w', 'utf-8') f.write(content)
Oh MediaWiki, how I love thee. (Note: The above remark may contain trace quantities of sarcasm.)
How come you did not opt for something pythonic like moin? media wiki is nice for some setups, but not everything...
MediaWiki was chosen over Moin in the hopes of user familiarity, and in theory because of more plugins for things like anti-spam measures, code highlighting, embedding RSS, and so forth.
in python 2.5 you need
iterparseinstead of
see effbot.org [effbot.org] for more information.
Using your script above, if there is no content, files are not created?
Does that mean the export from Confluence is incomplete or corrupted somehow?
I am trying to convert our work content across to MediaWiki. I just need to be able to get the exports (have both XML and HTML) to work in MediaWiki :\