Scraping property listings from Rightmove
A tutorial on how to collect data on the London property market
A couple of weeks ago, a friend asked me whether it was possible to help him find the perfect property to buy. I thought about it and realised that it’s impossible to run any analysis without data. Since I am based in London, I turned to Rightmove (the UK’s largest property listing website) and I built my own dataset! If you are looking for a tutorial on web scraping and how to use BeautifulSoup, or you’re just looking to run your own analysis on the London property market, you’ve come to the right place.
In this article, I aim to demonstrate how to build a dataset of property listings in London using information from Rightmove.
Navigating Rightmove’s (complicated) website
I was looking to build a complete dataset of all the properties in London, and so I thought about scraping all the properties in each borough of London. Thankfully, Rightmove allows us to search for properties by London borough and they also provide a unique code.
The unique code that Rightmove uses for each London borough can be found highlighted in the address bar. The code that we will be using for this tutorial is the one for Islington, which is 5E93965.
Unfortunately for us, Rightmove only allows us to scrape up to page 42 of their website, regardless of how many listings we have.
Therefore, in order to get the most relevant results, I decide to filter by “Newest listed” to get the most accurate picture of the current London housing market. Luckily for us, Rightmove allows us to easily do this by adding a “sortType=6” to the address bar.
Putting all of it together, we can start to write our code!
The code to scrape the data we want!
The first step we need to do is to install our packages. You can run the following code in your terminal
Finally, we have the information necessary to move on to the code! The first step is to import our packages.
The next step is to create a bunch of lists that will store our data. The data that I want to collect is:
- Property Link
- Address of property
- Number of bedrooms
- Price
Since the maximum number of pages that Rightmove displays is 42, I need to create an index that will start at 0 and add 24 for each page that we scrape. If the index is > the number of results, the loop will break.
Once we have our webpage, let’s begin scraping the data!
The next part of the code is to use BeautifulSoup to parse the data. The trick to using BeautifulSoup is to understand where in the HTML code the data that we want is stored.
We can see that each property card is located in each of these links, but they are totally unique for each property card! This is not very helpful for us and we need to find another way to sort through the data.
Thankfully, within each property listing, the information can easily be found. For example, the number of bedrooms and the address can be found in the “propertyCard-link” section of the HTML code.
We can scrape all the data using this code. We also find the number of total listings for each borough by using looking at the class=“searchHeader-resultCount” section of the HTML code. This, along with the index helps us to know what page we are on and whether we have exceeded the total number of pages (42):
The last part of the code is to export it to a CSV file. This can be done using the pandas module as such:
Ultimately the dataset that we obtain when we scrape all the London boroughs looks like this:
Conclusion
If you are looking to analyse the property market, Rightmove’s website allows us to obtain quite a lot of information. Additionally, if you need more detailed information (such as distance to the nearest tube station, key features of property and so on), you can scrape the data within the links themselves. Have fun and happy scraping!
The code for this tutorial (for all London boroughs) is available in this Github repository:
Google Colab Notebook can be accessed here: