Tracking you through DNS traffic: Linking user sessions by clustering with dirichlet mixture model

Mingxuan Sun, Junjie Zhang, Guangyue Xu, Dae Wook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Domain Name System (DNS), which does not encrypt domain names such as "bank.us" and "dentalcare.com", commonly accurately reflects the specific network services. Therefore, DNS-based behavioral analysis is extremely attractive for many applications such as forensics investigation and online advertisement. Traditionally, a user can be trivially and uniquely identified by the device’s IP address if it is static (i.e., a desktop or a laptop). As more and more wireless and mobile devices are deeply ingrained in our lives and the dynamic IP address such as DHCP has been widely applied, it becomes almost impossible to use one IP address to identify a unique user. In this paper, we propose a new tracking method to identify individual users by the way they query DNS regardless of dynamic changing IP addresses and various types of devices. The method is applicable based on two observations. First, even though users may update IP addresses dynamically during different sessions, their query patterns can be stable across these sessions. Secondly, domain name look ups in sessions are different from users to users according to their personal behaviors. Specifically, we propose the constrained Dirichlet multinomial mixture (CDMM) clustering model to cluster DNS queries of different sessions into groups, each of which is considered being generated by a unique user. Compared with traditional supervised and unsupervised models, our model does not acquire any labeled user information that is very hard to obtain in real networks or the specification of the number of clusters, and meanwhile enforces the maximum number of session data in each cluster, which fits the DNS tracking problem nicely. Experimental results on DNS queries collected from real networks demonstrate that our method accomplishes a high clustering accuracy and outperforms the existing methods.

Original languageEnglish
Title of host publicationMSWiM 2017 - Proceedings of the 20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems
PublisherAssociation for Computing Machinery, Inc
Pages303-310
Number of pages8
ISBN (Electronic)9781450351645
DOIs
StatePublished - Nov 21 2017
Event20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems - Miami, United States
Duration: Nov 21 2017Nov 25 2017
Conference number: 20

Conference

Conference20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems
Abbreviated titleMSWiM 2017
Country/TerritoryUnited States
CityMiami
Period11/21/1711/25/17

ASJC Scopus Subject Areas

  • Computer Networks and Communications
  • Modeling and Simulation

Keywords

  • Clustering
  • Dirichlet mixture model
  • DNS behavior tracking

Disciplines

  • Computer Sciences
  • Engineering

Cite this