TY - JOUR
T1 - Tracking You Through DNS Traffic: Linking User Sessions by Clustering With Dirichlet Mixture Model
AU - Sun, Mingxuan
AU - Zhang, Junjie
AU - Xu, Guangyue
AU - Kim, Dae Wook
PY - 2017/11/21
Y1 - 2017/11/21
N2 - The Domain Name System (DNS), which does not encrypt domain names such as "bank.us" and "dentalcare.com", commonly accurately reflects the specific network services. Therefore, DNS-based behavioral analysis is extremely attractive for many applications such as forensics investigation and online advertisement. Traditionally, a user can be trivially and uniquely identified by the device’s IP address if it is static (i.e., a desktop or a laptop). As more and more wireless and mobile devices are deeply ingrained in our lives and the dynamic IP address such as DHCP has been widely applied, it becomes almost impossible to use one IP address to identify a unique user. In this paper, we propose a new tracking method to identify individual users by the way they query DNS regardless of dynamic changing IP addresses and various types of devices. The method is applicable based on two observations. First, even though users may update IP addresses dynamically during different sessions, their query patterns can be stable across these sessions. Secondly, domain name look ups in sessions are different from users to users according to their personal behaviors. Specifically, we propose the constrained Dirichlet multinomial mixture (CDMM) clustering model to cluster DNS queries of different sessions into groups, each of which is considered being generated by a unique user. Compared with traditional supervised and unsupervised models, our model does not acquire any labeled user information that is very hard to obtain in real networks or the specification of the number of clusters, and meanwhile enforces the maximum number of session data in each cluster, which fits the DNS tracking problem nicely. Experimental results on DNS queries collected from real networks demonstrate that our method accomplishes a high clustering accuracy and outperforms the existing methods.
AB - The Domain Name System (DNS), which does not encrypt domain names such as "bank.us" and "dentalcare.com", commonly accurately reflects the specific network services. Therefore, DNS-based behavioral analysis is extremely attractive for many applications such as forensics investigation and online advertisement. Traditionally, a user can be trivially and uniquely identified by the device’s IP address if it is static (i.e., a desktop or a laptop). As more and more wireless and mobile devices are deeply ingrained in our lives and the dynamic IP address such as DHCP has been widely applied, it becomes almost impossible to use one IP address to identify a unique user. In this paper, we propose a new tracking method to identify individual users by the way they query DNS regardless of dynamic changing IP addresses and various types of devices. The method is applicable based on two observations. First, even though users may update IP addresses dynamically during different sessions, their query patterns can be stable across these sessions. Secondly, domain name look ups in sessions are different from users to users according to their personal behaviors. Specifically, we propose the constrained Dirichlet multinomial mixture (CDMM) clustering model to cluster DNS queries of different sessions into groups, each of which is considered being generated by a unique user. Compared with traditional supervised and unsupervised models, our model does not acquire any labeled user information that is very hard to obtain in real networks or the specification of the number of clusters, and meanwhile enforces the maximum number of session data in each cluster, which fits the DNS tracking problem nicely. Experimental results on DNS queries collected from real networks demonstrate that our method accomplishes a high clustering accuracy and outperforms the existing methods.
KW - Clustering, Dirichlet mixture model, DNS behavior tracking
UR - https://corescholar.libraries.wright.edu/cse/530
U2 - 10.1145/3127540.3127567
DO - 10.1145/3127540.3127567
M3 - Article
JO - MSWiM 2017 - Proceedings of the 20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems
JF - MSWiM 2017 - Proceedings of the 20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems
ER -