Thursday, 15 October 2015

More on Planning and Designing

Before identifying which analysis processes the visualisation will support, I would need first to clarify the scope and purpose of the visualisation and the tasks for the audience.

After some more research on the initial plan of visualising World Global Oil Trade, I think I should pivot the topic due to following reasons. 
I can find oil production data by nation and trade data (import and export) but not flow data. This means that I cannot visualise the flow from one nation to another as I anticipate. 
First, that would make the visualisation much less interesting for the visual effects.
Second, my hypothesis is that by looking at the trade data with the flow, the user can find a connection between diplomatic relations between countries and their oil trade volume. This won’t be possible without the flow data.

So my plan now is to merge the imports/exports data with production data and oil prices. Oil prices tend to interfere with economy growth of importing nations. Also, the visualisation might provide a data-driven answer for the collapse in oil production in recent years. The user tasks are to find out:
How World Oil Supply and Price relate to the growth of World Economy (particularly by looking at 2008 crisis, and the changes right after the crisis and after 2011 when there are no post-crisis stimulus actions anymore)?
What a nation’s oil production data, imports and exports tell us about its economy?
Predicting world economy with oil data

Now I will do a quick recap on the eight visual analysis processes by Isenberg et al: browse, parse, discuss collaboration style, establish task strategy, clarify, select, operate and validate. This summarize helps me better understand the processes and apply them to my visualisation correctly.

During the Browse process, the team scans through the available data to form their first impressions. The Parse process involves reading of the task to create common understanding of the problem and how to solve it. During Discuss Collaboration Style process, the team discusses the overall task division strategy. In the Establish Task Strategy process, the team figures out the best ways to perform the tasks with the available data and tools. The Clarify process involves understanding information artefacts. The Select process is about finding and selecting relevant information for a particular task. In Validate process, the team confirms the solution of a task. Except for the Discuss Collaborative Style process, I think that the remaining seven processes are also true for individual visualisation analysis. 

Using the above framework in designing the World Oil Trade and Supply visualization, I anticipate that the visualization will support:

1. Browse/Overview: Given the large amount of data that this visualisation encapsulates and the fact that people tend to be overwhelmed by information overload. I think it is best not to display all available data in the beginning. Instead, it will first display data of selected countries over the years (top oil producers, top oil importers, and selected countries of interest such as Finland and neighboring countries). This enables the audience to scan through important data. It is, however, possible to enable overview of all data (action view all data).  

2. Parse: The questions that suggest the tasks for the audience are displayed on top of the visualisation. These questions steer them in the direction of finding insights I want them to look for.

3. Clarify: The visualisation will include annotations that explain different visual encodings and the actions users can perform.

4. Select: User can toggle or select/deselect different data layers. They can also filter information for a specific country.


5. Validate: There will be an animation of changes over time and about the relationships between oil supply and world economy that enable the audience to validate their findings. This animation can be played once the visualisation is loaded before the audience performs any of their actions.

Thursday, 1 October 2015

Planning and Forseeing Challenges

1. Discovery

Where to find my data? 
One good publicly available datasource for my project is UN Comtrade Database: http://comtrade.un.org/data/doc/api/
Using this API I could query the trade data of specified commodities, in this case I am interested in Petroleum and Oil Products (cc=27). When I specified to get data from all years for all countries, the data is too large to be queried. So I query the data for all countries in each year, for example, for 2011 the query looks like this:
Data can be queried in son or csv format. I choose json. I collected data of 10 years from 2004-2014. 

Sources for Trade flow
UN Comtrade Database is detailed and has good quality, however, it doesn’t include the trade flow, i.e. the flow from the exporting country to the importing country and the trade volume between them. I still need to find other sources for this information.

How about trade value in relations with population and GDP per capita?
I collected GDP data from UN database: http://data.un.org/Search.aspx?q=GDP+per+capita
and population data from WorldBank database: http://data.worldbank.org/indicator/SP.POP.TOTL

Do developed countries use more renewable energy?
To answer this, I also need data from renewable energy consumption.


2. Wrangling

When checking the data, I realised that sometimes data is available in one year but not in others, e.g. data of UAE is available only prior to 2008. This will require wrangling data, to include countries for which data is comprehensive.

Since I need data from several sources, data integration is necessary and will give rise to integration issues. One challenge that I can foresee is the mismatch between the country encodings. For example, “CHN” or “Republic of China” or “R.P.C” all refer to China. To resolve this issue, I am planning to have a list of ISO 3166 code and map the codes to all possible names.

Another possible problem across databases is the different unit used. While in one database, trade unit can be in US Dollars, in another database it can be in volume (also mismatch of volume units, gallons or barrels or liters).


3. Profiling

Some assumptions might be made during this phase, for example, to disregard monetary inflation when representing trade value in US Dollars throughout different years, or to assumed a constant inflation rate for simplicity.
I need to consider carefully when to use trade values in Dollars and when to use trade volume, and how to merge these units.


4. Modeling

Scale will definitely an issue here because the number of countries are large and there are extremely large countries and extremely small countries and this will most likely affect import values. One possible solution is to let users choose a set of countries at a time, e.g. minimum 5 up to 15 countries at a time for the visuals. The scale will be recalculated each time based on the selected countries.


5. Reporting
Rickshaw and RAW, Tableau, etc. are high-level visualisation tools that allow us to plug in data and generate graphs efficiently. However these tools are not quite flexible. D3.js is more flexible but it requires steep learning curve for beginners. I plan to use D3.js if time permits and if I have good progress with previous phases. I might switch to RAW if I spend more time on explorative analysis.
  

  

Motivation to Visualize Global Oil Products Trade

This blog describes the progress of my student project for the course Explorative Information Visualization at Aalto University, Autumn Semester, AY 2015-2016. 

An Effective visualization conveys insights to the audience by letting them connect things based on the evidence provided on their own. Patterns, trends, and outliers that we might miss when looking at data tables can be easily spotted using visuals. This shows that information visualization is vital, especially when we have to deal with large data sets. It is built on art and design so besides coding and mining data, I will also learn to design visuals and craft a compelling story to communicate with the audience.

I choose to visually exploring the data flows of global oil products trading. Effective vizualisations of fuel trade flows can provide a comprehensive view of international demand and consumption of energy resources and enhance our understanding of underlying patterns and trends. It can also help the audience determine the relationships between countries based on their trades.

A common way of representing data flow is to use geographic maps which show country-to-country flows as stroked lines, see figure below:


The lines has arrow-heads to represent flow direction, however it is still difficult for users to distinguish in-flows and out-flows. The lines are of the same thickness so they don’t represent well the trade volumes between countries.

My goal is to create an interactive visualization with filters that allows users to zoom in on one country in a specific year for either import or export. A timeline slider will be provided so that users can move along the timeline to easily discover the changes in the flows over the years. 

The visualisation will effectively help users to predict the near future demands and the sources of oil products and recognize strategic trade-relationships between countries.