Chapter 2 Data sources
We are going to work with files generated by the game itself. They are called ‘demo’ files and were created by developers to replay matches from any point of view.
2.1 What match to analyse?
There are thousands of pro-level matches but only the top teams are able to play in tournaments that are called Majors. We chose to analyse Major tournaments as the strategies and team plays of the top 20 teams are the most copied by other players. Additionally the consistent level of performance of higher level teams, and the controlled environment that offline tournaments allow means that is will be easier to get meaningful data as it is less likely that teams will be influenced by external factors (internet connection, different time-zone and schedule…) .
2.2 Getting the demo files
Each match is recorded in a demo file generated by the game. A demo file contains all the raw data of a match, and is intended to be read by the game to replay the match. Demo files of Majors can be downloaded from the game, or from the HLTV website.
We used HLTV to download them as it can conveniently be parsed. HLTV is a site that compiles all pro-level matches. It collaborates with event organizers to display real-time statistics and scores, and also shares recorded demo files. The code we used to automatically parse the html pages is included in the github repository. The downloaded demo files resulted in a total of 82.1GB of data, and as such has not been uploaded.