I mostly write about programming in Python and Julia for Data Visualization and Data Science but occasionally I stray into other areas of science, technology and even a bit of philosophy.

If you’d to be notified when I publish new articles, please click the link below, or on my profile page, or subscribe to my free, occasional, newsletter here.

Data Visualization and Data Science


Using SQLite to store your Pandas dataframes gives you a persistent store and a way of easily selecting and filtering your data

Image for post
Image for post
Photo by Markus Winkler on Unsplash

The SQLite database is a built-in feature of Python and a very useful one, at that. It is not a complete implementation of SQL but it has all the features that you need for a personal database or even a backend for a data-driven web site.

Using it with Pandas is simple and really useful. You can permanently store your dataframes in a table and read them directly into a new dataframe as you need them.

But it isn’t just the storage aspect that is so useful. …


Data Visualization

Google Charts lets you create pleasing interactive plots but cannot read a Pandas dataframe directly. Here we create a simple Python Web App to combine the two technologies.

Image for post
Image for post
Image by author

Google Charts have been around for more years than I care to remember, so I think we can safely assume that it is a mature and stable technology.

It can also be used to create attractive charts and graphs.


How to analyse the sentiment of tweets using the VADER Python library

Image for post
Image for post
Photo by Chris Liverani on Unsplash

Sentiment Analysis, or Opinion Mining, is often used by marketing departments to monitor customer satisfaction with a service, product or brand when a large volume of feedback is obtained through social media.

Gone are the days of reading individual letters sent by post. Today’s customers produce vast numbers of comments on Twitter or other social media.

Such a large amount of data cannot be reasonably analysed individually, so what is produced electronically has to be analysed electronically.

There are two fundamental Sentiment Analysis solutions: first, there are rule based systems that use a lexicon of words and rules to classify a particular piece of text and, second, there are systems that use machine learning techniques that analyse a set of texts that are already labelled with a particular classification (typically, positive or negative) and predict a classification of a new text based upon this. …


The official Twitter statistics are great to see how well you are engaging with your audience but you can use Python to see how are others doing, too

Image for post
Image for post

If you use Twitter cards or adverts, you can get a very good idea of how people are engaging with your tweets from the official Twitter statistics. But what about your friends, your colleagues… your competitors? Just a little bit of Python code can help.

Imagine that you are a global news giant and are wondering just how well you are regarded by your audience compared with, say, CNN, or the BBC. One thing you might do is compare the level of engagement of their tweets and compare them to your own.

You can’t see the same engagement statistics as for your own account (they are, of course, private) but there are some simple stats that you are able to see. It just takes a little programming and a Twitter Developer account.

I’m going to show you how it is really quite straightforward to monitor the number of retweets and likes that other users have and compare them to your own or others by using the Twitter developer’s API. We’ll also produce some simple statistics and graphs using Pandas. …


Using Python and Pandas I converted a text document meant for human readers into a machine readable dataframe — but differently this time.

Image for post
Image for post
Image by author

I wrote an article recently about how I converted a text file that was only partly structured into a useful dataframe in pandas. It demonstrated a few useful techniques. However, I also came up with a slightly more generic technique which also produces a good result.

Here is a slightly slimmer version of the original article that demonstrates the new method and code.

These days much of the data you find on the internet are nicely formatted as JSON, Excel files or CSV. But some aren’t.

I needed a simple dataset to illustrate my articles on data visualisation in Python and Julia and decided upon weather data (for London, UK) that was publicly available from the UK Met Office. …


Beautiful Soup is a great tool for extracting data from web pages but it works with the source code of the page. Dynamic sites need to be rendered as the web page that would be displayed in the browser — that’s where Selenium comes in.

Image for post
Image for post
Image by Author

Beautiful Soup is an excellent library for scraping data from the web but it doesn’t deal with dynamically created content. That’s not in any way a criticism — Beautiful Soup does precisely the job it is supposed to do and that does not include rendering the webpage as a browser would.

In order to get that dynamic content the web page must be interpreted by a browser so the Javascript that creates the dynamic content can do its work. But how do we get at the HTML code that is rendered by the browser? One answer is by using a headless browser and the Selenium Python library. …


Using Python and Pandas, I converted a text document meant for human readers into a machine readable dataframe

Image for post
Image for post
Semi-structured data on the left, Pandas dataframe and graph on the right — image by author

These days much of the data you find on the internet are nicely formatted as JSON, Excel files or CSV. But some aren’t.

I needed a simple dataset to illustrate my articles on data visualisation in Python and Julia and decided upon weather data (for London, UK) that was publicly available from the UK Met Office.

The problem was that it was a text file that looked like a CSV file but it was actually really formatted for a human reader. …


With Python3 and 3.6 there are new and much better ways of formatting strings

Image for post
Image for post
Image by Author

I don’t use Python string formatting very much and when I do, I normally just use the C-style formatting that has been with Python since the year dot.

But the other day I wanted to format some SQL queries that used the same string a number of times and I was slightly offended by the code that I had to write. …


Whether you are sharing your Jupyter Notebooks with friends and colleagues or publishing them more widely, they will be better appreciated if they are well laid out and formatted.

Image for post
Image for post
Image by Author

You can put comments in your Jupyter Notebook code to help the reader to understand what you are up to. But longer commentary is better in text cells separate from the code.

Text cells in Jupyter support the Markdown language and we are going to take a look at the facilities that it offers. Markdown is a set of simple markup codes that are easily transformed into HTML for rendering in a browser. …

About

Alan Jones

Technology, science and programming. Subscribe to my newsletter here: https://technofile.substack.com/ . Web page: https://alanjones.pythonanywhere.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store