Mainly Tech projects on Python and Electronic Design Automation.

Saturday, February 28, 2009

Extended Vanity Search on Rosetta Code

This is the href="http://paddy3118.blogspot.com/2009/02/vanity-search-on-rosetta-code.html">first,
extended so I can see how far up I am in the href="http://www.rosettacode.org/w/index.php?title=Special:PopularPages">league
table of page views.



Since that first blog entry I have created three new pages:' href="http://www.rosettacode.org/wiki/First-class_functions">First-class
functions ', href="http://www.rosettacode.org/wiki/Octal">'Octal'
which was created after I made a large edit to the href="http://www.rosettacode.org/wiki/Hexadecimal">Hex
page, and 'Y
combinator
', which is my current ' href="http://paddy3118.blogspot.com/2009/02/y-combinator-in-python.html">stretch
goal'.



Here is the code:


'''
Rosetta Code Vanity search:
color="#ff00ff"> How many new pages has someone created?
'''

color="#a020f0">import urllib, re

user = ' color="#ff00ff">Paddy3118'

site = ' color="#ff00ff">http://www.rosettacode.org'
nextpage = site + ' color="#ff00ff">/wiki/Special:Contributions/' + user
nextpage_re = re.compile(
r' color="#ff00ff"><a href="([^"]+)" title="[^"]+" rel="next">older ')

newpages = []
pagecount = 0
color="#804040">while nextpage:
page = urllib.urlopen(nextpage)
pagecount +=1
nextpage = ''
color="#804040">for line color="#804040">in page:
color="#804040">if color="#804040">not nextpage:
color="#0000ff"># Search for URL to next page of results for download
nextpage_match = re.search(nextpage_re, line)
color="#804040">if nextpage_match:
nextpage = (site + nextpage_match.groups()[0]).replace(' color="#ff00ff">&amp;', ' color="#ff00ff">&')
color="#0000ff">#print nextpage
npline=line
color="#804040">if ' color="#ff00ff"><span class="newpage">' color="#804040">in line:
color="#0000ff"># extract N page name from title
newpages.append(line.partition(' color="#ff00ff"> title="')[2].partition(' color="#ff00ff">"')[0])
page.close()

nontalk = [p color="#804040">for p color="#804040">in newpages color="#804040">if color="#804040">not ' color="#ff00ff">:' in p]

color="#804040">print " color="#ff00ff">User: %s has created %i new pages of which %i were not Talk: pages, from approx %i edits" % (
user, len(newpages), len(nontalk), pagecount*50 )
color="#804040">print " color="#ff00ff">New pages created, in order, are: color="#6a5acd">\n ",
color="#804040">print " color="#6a5acd">\n ".join(nontalk[::-1])




nextpage = site + ' color="#ff00ff">/w/index.php?title=Special:PopularPages'
nextpage_re = re.compile(
r' color="#ff00ff"><a href="([^"]+)" class="mw-nextlink">next ')

data_re = re.compile(
r' color="#ff00ff">^<li><a href="[^"]+" title="([^"]+)".*</a>.*\(([0-9,]+) views\)' )

title2rankviews = {}
rank = 1
pagecount = 0
color="#804040">while nextpage:
page = urllib.urlopen(nextpage)
pagecount +=1
nextpage = ''
color="#804040">for line color="#804040">in page:
color="#804040">if color="#804040">not nextpage:
color="#0000ff"># Search for URL to next page of results for download
nextpage_match = re.search(nextpage_re, line)
color="#804040">if nextpage_match:
nextpage = (site + nextpage_match.groups()[0]).replace(' color="#ff00ff">&amp;', ' color="#ff00ff">&')
color="#0000ff"># print nextpage
npline=line
datamatch = re.search(data_re, line)
color="#804040">if datamatch:
title, views = datamatch.groups()
views = int(views.replace(' color="#ff00ff">,', ''))
title2rankviews[title] = [rank, views]
rank += 1
page.close()

color="#804040">print " color="#6a5acd">\n\n Highest page Ranks for user pages:"
fmt = " color="#ff00ff"> %-4s %-6s %s" color="#0000ff"># rank, views, title
color="#804040">print fmt % (' color="#ff00ff">RANK', 'VIEWS', ' color="#ff00ff">TITLE')
highrank = [title2rankviews.get(t,[99999, 0]) + [t] color="#804040">for t color="#804040">in nontalk]
highrank.sort()
color="#804040">for x color="#804040">in highrank:
color="#804040">print fmt % tuple(x)




(I promise to restructure it if I go back to it again :-)



Here is the output:


User: Paddy3118 has created 33 new pages of which 20 were not Talk: pages, from approx 300 edits
New pages created, in order, are:
Spiral
Monty Hall simulation
Web Scraping
Sequence of Non-squares
Anagrams
Max Licenses In Use
One dimensional cellular automata
Conway's Game of Life
Data Munging
Data Munging 2
Column Aligner
Probabilistic Choice
Knapsack Problem
Yuletide Holiday
Common number base conversions
Octal
Integer literals
Command Line Interpreter
First-class functions
Y combinator


Highest page Ranks for user pages:
RANK VIEWS TITLE
107 1767 Monty Hall simulation
127 1409 Conway's Game of Life
171 1109 Anagrams
183 1037 Knapsack Problem
224 812 Max Licenses In Use
232 789 Web Scraping
239 717 Spiral
242 712 One dimensional cellular automata
289 536 Sequence of Non-squares
321 442 Yuletide Holiday
329 422 Column Aligner
333 418 Probabilistic Choice
347 389 Data Munging
351 382 Data Munging 2
427 175 Integer literals
448 128 Common number base conversions
454 110 Command Line Interpreter
469 61 First-class functions
480 42 Octal
634 2 Y combinator




A quick perusal of the results shows that although Conway's Game of
Life was created later, it is high in views; conversely Spiral, my
first page, is down in views.





- Paddy.

No comments:

Post a Comment

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive