pythonnotes's posterous http://pythonnotes.posterous.com Most recent posts at pythonnotes's posterous posterous.com Thu, 19 Jul 2012 02:46:03 -0700 Grabbing from a particular index onwards http://pythonnotes.posterous.com/grabbing-from-a-particular-index-onwards http://pythonnotes.posterous.com/grabbing-from-a-particular-index-onwards From the Scraperwiki mailing list:
    You can open .xls-es then navigate around them w/ xlrd

 
I did one here: 
https://scraperwiki.com/scrapers/testing_with_the_intro_tutorial_atlanta_schedule/
 
I had like, one zillion spreadsheets all formatted exactly the same and I 
needed, like, rows[40:] out of all of them. So this scraper opened each 
one, scraped out all the rows from 40 till the end, closed the xls, went 
back to the main page and iterated down to the next xls to open. 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Mon, 16 Jan 2012 02:46:00 -0800 Using .split to convert a bunch of items to an array (list) http://pythonnotes.posterous.com/using-split-to-convert-a-bunch-of-items-to-an http://pythonnotes.posterous.com/using-split-to-convert-a-bunch-of-items-to-an

Thanks to Zarino Zappio for this tip following an email to the Scraperwiki mailing list. The particular problem for which this was one possible solution was having a list of codes that you need a scraper to loop through. (In the end I solved this by using the =JOIN formula in the spreadsheet containing the codes (and adding an extra inverted comma at the start and finish))

But if that list of numbers is on a page, here's a useful way to convert it into an array:

"Get your scraper to loop over them as an array. You'd probably need to do some formatting, but it depends how your spreadsheet program handles copying cells. If it's generous and just gives you a string with the cell values separated by spaces or line breaks, you can use python to split the string into an array, over which you can loop…

pastedtext = "5237521 4398721 5293752 5967124"
lovelyarray = pastedtext.split()
# we now have a lovely array like ['5237521', '4398721', … '5967124']
for id in lovelyarray:
    print scraperwiki.scrape('http://yoururl.org.uk/page?iSchoolID=' + id)

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Tue, 09 Aug 2011 11:02:33 -0700 Extracting text by looking for classes - using lxml http://pythonnotes.posterous.com/extracting-text-by-looking-for-classes-using http://pythonnotes.posterous.com/extracting-text-by-looking-for-classes-using Using Scraperwiki to extract text from a webpage:

This uses the lxml module - documentation here: http://lxml.de/lxmlhtml.html#parsing-html
 

# import a module (library) that helps us do scraping
import scraperwiki
# import another that helps us extract things from the scraped data
import lxml.html
# use that module's scrape function to grab the contents of a URL and put it in the variable HTML
html scraperwiki.scrape("http://www.nhs.uk/Services/Trusts/GPs/DefaultView.aspx?id=5PG")
# use the lxml.html's fromstring function to grab some structured data, put in a variable called gplist
gplist lxml.html.fromstring(html
# get the first <p class="child-org-name"> <a> tags from within that, put in a list(?) called gpname. The class is indicated by the period before it.
gpname gplist.cssselect(".child-org-name a")
#loop through the list of items and...
for gp in list(gpname):
     record "gp" gp.text # create a column name and store the text of each occurrence
     scraperwiki.sqlite.save(["gp"]record# save the records one by one

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Sat, 04 Dec 2010 07:46:27 -0800 Python functions http://pythonnotes.posterous.com/python-functions http://pythonnotes.posterous.com/python-functions
# Functions are one of the most powerful parts of programming - they can perform all sort of - well - functions, basically.
# Functions can look for particular things within content, perform calculations, create new objects, alter lists and dictionaries, and lots else besides.
# Essentially they are pieces of script within the script, and as such can be run repeatedly until a particular condition is met (e.g. something is found or a number is reached)
# They can also be left unused unless a particular condition is met
# Like so many things in programming, the main advantage of a function is that it saves you time by allowing you to run one piece of code over and over again

#Functions are created with the word def as follows:

def functionname():

#the parenthesis contains any 'arguments' that are passed to the function - that is, any information which the function might use, such as a starting number, string (text), variable or state (such as true or false).
#the colon is needed to begin building the function in the lines that follow, which should be indented

def multiply_me(first_number, second_number):
    third_number = first_number * second_number
    print third_number

# In this example the function 'multiply_me' expects to be supplied with 2 arguments (in the brackets)
# these arguments are given variable names at the same time (in the brackets) - first_number and second_number
# a new variable is created - third_number. In the same line it is given a value: the result of multiplying first_number by second_number
# finally, that new variable is printed (displayed to the user)

# However, for a function to actually run it has to be called. This is done in a script as follows:

multiply_me(200, 300)

# That line sets the computer looking for the multiply_me function within the script
# It also passes the two numbers within the brackets to the function
# The function then runs, taking in those numbers and using them to make the calculation, and print it.
# If you were calling a function which did need any information to work, you would simply leave the brackets empty:

multiply_me()

# In this case, because the function does need two arguments to work, you would get an error saying how many arguments it takes

#Python has many built-in functions that you do not need to create from scratch - including functions that will convert numbers into binary. These are listed in Python's documentation.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Sat, 04 Dec 2010 00:41:53 -0800 Python dicts http://pythonnotes.posterous.com/python-dicts http://pythonnotes.posterous.com/python-dicts
#Dictionaries are a useful way to store information about a range of items that you can then interrogate in various ways

#Dictionaries allow you to create a list of items and their values - more like a database than a list

#Whereas lists are accessed by items' positions, dictionaries are accessed by item names

#Here's a line of code that creates a dictionary

myfridge = {'topshelf': 'ham', 'bottom_shelf': 'cheese', 'inner_door': 'milk'}

#The curly brackets tell us that this is a dict
#the colon assigns a value to each item
#the comma separates each item/value pairing

#these can then be 'called' as follows:

myfridge['inner_door']

#this will return whatever value is associated with 'inner_door'

#new values can be added as follows:

myfridge['middle_shelf'] = 'beer'

#values can also be deleted with the del keyword:

del myfridge['middle_shelf']

#functions can be inserted into a dictionary and will run when called - more detail in Learn Python the Hard Way.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Sat, 04 Dec 2010 00:34:30 -0800 Python classes http://pythonnotes.posterous.com/python-classes http://pythonnotes.posterous.com/python-classes
# Classes save you time when creating new objects by giving them pre-set qualities
# So for example if instead of having to write script that said 
#'this MP gets a certain number of votes and represents this party etc. etc.'
# You can create an MP 'class' which says *all* MPs have votes, parties, gender, age, etc.
# Then when you create a new MP you only have to fill in the values of those votes etc.
# You can save further time by having default values, e.g. saying your MPs represent Labour unless otherwise specified.

# Classes are created with the class keyword:

class Food:

#the colon starts the indented section that will define this class
#that section will include the variables, functions and constructor that define the class:

class Food(object):
    def __init__(self, calories, weight):
        self.calories = 0
        self.weight = 0

#__init__ is a 'class constructor' that 'initialises' (sets) particular qualities of the class
#again this needs to end with a colon:
#and further indented lines define default values for those qualities
#the 'self' bit basically refers to the individual instance (object) of this class that is later created
# - it is saving you writing more complex code when you create an instance of the class. Just do it.

#when you create a new instance of that class, e.g. a new item of food, it will have those qualities at those default values
#or at values defined in the parameters you pass in creating that new instance, e.g.

cheese = Food(34, 100)

# This code creates a new variable 'cheese' and assigns it values from the 'Food' class
# Essentially this line goes to look for the script creating the Food class, finds that it has calories and weight, and assigns those qualities to 'cheese'
# The 2 numbers in brackets - 34 and 100 - are assigned ('passed') to those qualities in turn, so cheese has 34 calories and a weight of 100.
#If no numbers were given - i.e. if the brackets were empty - then the new object 'Cheese' would have the default values of the class 'Food' - in this case, 0 and 0.

#Like any variable, once created you can do various things with cheese - ask how many calories it has, change its weight, add its weight to the weight of other objects, and so on.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Mon, 29 Nov 2010 09:56:58 -0800 Python lists http://pythonnotes.posterous.com/python-lists http://pythonnotes.posterous.com/python-lists #Lists are useful as a way of assigning a series of values to a particular list name
#Those values can then be called by their position (index) in the series

myfridge = ['cheese', 'milk', 'beer']

#the square brackets indicate that this is a list
#each item in the list is separated by a comma
#items can then be called as follows:

myfridge[0]

#this would call the first item in the list (index position 0)

#you can also change the contents of a list with various functions as follows

myfridge.append('ham')

#this appends - .append - the string 'ham' to the list myfridge. 
#Note the full stop before .append and the parenthesis containing the argument to be passed
#the parenthesis could also use a variable instead of a string, or a function that returns a value, or another list

myfridge.pop(-1)

#this returns the last item in the list (at index position -1) and removes it from the list at the same time (pops)

#other functions include 
#counting how many times something occurs in a list - .count('ham')
#returning the index of a particular item in a list - .index('ham')
#removing the first instance in a list of a particular item - .remove('ham')
#inserting an item in a list at a particular position - .insert(3, 'ham')
#sorting items - .sort()
#reversing the order of items - .reverse()

#other list functions can be found at http://docs.python.org/tutorial/datastructures.html

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw
Sat, 27 Nov 2010 04:24:28 -0800 test http://pythonnotes.posterous.com/test http://pythonnotes.posterous.com/test

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1080526/paulb_bigger.jpg http://posterous.com/users/ZyH4FHK2wVz Paul Bradshaw paulbradshaw Paul Bradshaw