f ²(web)

wikiwho: authorship attribution and more.


introduction to the functionality of wikiwho.

wikiwho source code.

the python code of the original wikiwho publication plus some extensions we made since then.

wikiwho api.

we are currently working on an api for offering authorship information on live Wikipedia data. a first working alpha version is available.

wikiwho core algorithm description.

the research paper about the wikiwho algorithm for mining authorship (plus evaluation material used in the paper).


the core functionality of wikiwho is to parse the complete set of all historical revisions (versions) of a wikipedia article in order to find out who wrote and/or removed which exact text at what point in time. this means that given a specific revision of an article (e.g., the current one) wikiwho can determine for each word and special character which user first introduced that word and if and how it was deleted/reintroduced afterwards. this functionality is not offered by wikipedia as such and wikiwho was shown to perform this task with very high accuracy (~95%) and very efficiently, being the only tool that has been scientifically proven to perform this task that well (cf. the paper).

on top of the generated authorship and change data, other data can be mined and other tools can be build. we have extended the original model to also provide relationships between editors in an article such as "delete" or "reintroduce" based on the word they delete or add. we are currently working on a visualization of these networks as well as other visualization of metrics and word authorship useful for end-users that are interested in exploring the collaborative writing dynamics of wikipedia.

wikiwho api.

We offer a first version of an API for word provenance/authorship:
You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) at


(@ARTICLENAME@ -> name of the article in ns:0, in the english wikipedia, @REV_ID@ -> rev_id of that article for which you want the authorship information, format is currently only json)

Example: http://wikiwho.net/wikiwho/wikiwho_api_api.py?revid=649876382&name=Laura_Bush&format=json&params=author

Output format is currently: {"tokens": [{"token": "@FIRST TOKEN IN THE WIKI MARKUP TEXT@", "author_name": "@NAME OF AUTHOR OF THE TOKEN@", "rev_id": "@REV_ID WHEN TOKEN WAS FIRST ADDED@"}, {"token": "@SECOND TOKEN IN THE WIKI MARKUP TEXT@", "author_name": "@NAME OF AUTHOR OF THE TOKEN@", "rev_id": "@REV_ID WHEN TOKEN WAS FIRST ADDED@"}, {"token": "@THIRD TOKEN … … ], "message": null, "success": "true", "revision": {"article": "@NAME OF REQUESTED ARTICLE@", "time": "@TIMESTAMP OF REQUESTED REV_ID@", "reviid": @REQUESTED REV_ID@, "author": "@AUTHOR OF REQUESTED REV_ID@"}}

IF YOU CAN: Let me know if you use it / like it / don't like it / fine any specific errors / want any specific features. Email: f.floeck-youknowwhat-gmail.com

DISCLAIMER: We are working on the speed and providing more precomputed articles (right now most are computed on request, although we save intermediary results). Still, for most articles it works fine and the output has been tested for accuracy. Occasionally there are problems with getting/processing the XML for larger articles right now, so don't be surprised if that gives you an error sometimes.

CREDIT: Philipp Singer implemented most of the current version of the API and Pavan Kumar Pandappa built the first prototype version.

wikiwho source code.

the original code plus some variants that contain extensions, especially a new function extracting relations between editors. note that extended versions might include additional computational steps that can lead to higher runtimes than the original. all available under MIT licsense at:


wikiwho paper: detecting authorship of revisioned content.