High-Performance Data Stream Mining Workshop

co-located with IEEE International Conference on Data Mining ICDM 2017

New Orleans, USA, November 18 - November 21, 2017

Learning from data streams have emerged as one of the most vital topic in contemporary machine learning and data stream mining. They encompass several challenges for modern intelligent systems: potentially unbounded volume of data, instances arriving at high speed in varying intervals, changing and evolving decision space, difficulties with access to ground truth, as well as need for managing heterogeneous forms of information. Volume and velocity are difficult tasks to handle on their own, yet they need to be considered from a perspective of non-stationary problems affected with a phenomenon known as concept drift. This problem has been thoroughly studied in the last decade with a specific focus on classification tasks. However, the research community has started to address this problem within other contexts such as data preprocessing, regression, multi-label classification, association rule mining, imbalanced learning, graph and xml mining, social and mobile networks, as well as novelty detection. It is now recognized that imbalanced domains are a broader and important problem posing relevant challenges for both supervised and unsupervised learning tasks, with handling various embedded difficulties in an increasing number of real world applications.

Tackling the issues raised by data stream mining is of high importance to people from both academia and industry. For researchers, these challenges offer an exciting option to develop adapting, evolving and efficient learning methods that will be able to handle such difficult cases. For industry, many of real problems to be faced actually arrive in form of streams, thus such methods are vital for tackling these tasks. They require methods that enable a more preemptive, real-time action in an increasingly fast-paced world and are able to constantly update and evolve knowledge and models in accordance with the current state of data. Additionally, with the ever-increasing scale and complexity of these problems, we need high-performance computing environments (clusters, cloud computing, GPUs) and fast, incremental, ideally single-pass algorithms to offer highest possible predictive power at lowest time and computational cost.

Scope of HighStream’2017

The research topics of interest to HighStream'2017 workshop include (but are not limited to) the following:

  • Foundations of learning from data streams
    • Probabilistic and statistical models
    • Understanding the nature of learning difficulties embedded in streaming and non-stationary data
    • Identifying and handling concept drift
    • High-performance computing environments for big data streams
    • Deep learning with streaming data
    • New approaches for data pre-processing (e.g. discretization strategies)
    • Post-processing approaches
    • Sampling approaches
    • Feature selection and feature transformation
    • Evaluation in streaming domains
    • Online model selection
    • Learning for heterogeneous and multiple data streams
    • Context-awareness for data stream mining
    • Resource-aware learning from data streams
  • Knowledge discovery and data mining in data streams
    • Classification
    • Regression
    • Clustering
    • Novelty detection and evolving class structures
    • Learning from imbalanced data streams
    • Active learning and label latency
    • Multi-label, multi-instance, sequence and association rules mining
    • Graph stream mining
    • On-line ensemble models
    • Smart data mining with compact models
  • Applications in solving real-life problems
    • Social network applications
    • Medical data streams
    • Ubiquitous and mobile stream mining
    • Engineering and industrial applications
    • Fraud and intrusion detection
    • Environmental applications

Submission and deadlines

Paper submissions should be limited to a maximum of ten (10) pages, in the IEEE 2-column format, including the bibliography and any possible appendices. Submissions longer than 10 pages will be rejected without review. All submissions will be single-blind reviewed by the Program Committee on the basis of technical quality, relevance to scope of the workshop, originality, significance, and clarity.

Submissions should use the ICDM system for workshop papers.

August 7, 2017
Workshop paper submissions
September 4, 2017
Workshop paper notifications
Workshop final version
November 18 -21, 2017

Workshop Chairs

Bartosz Krawczyk

Department of Computer Science, Virginia Commonwealth University, USA

Bartosz Krawczyk is an assistant professor in the Department of Computer Science, Virginia Commonwealth University, Richmond VA, USA, where he heads the Machine Learning and Stream Mining Lab. He obtained his MSc and PhD degrees from Wroclaw University of Science and Technology, Wroclaw, Poland, in 2012 and 2015 respectively. His research is focused on machine learning, data streams, class imbalance, ensemble learning, one-class classifiers, and interdisciplinary applications of these methods. He has authored 35+ international journal papers and 80+ contributions to conferences. Dr Krawczyk was awarded with numerous prestigious awards for his scientific achievements like IEEE Richard Merwin Scholarship and IEEE Outstanding Leadership Award among others. He served as a Guest Editor in four journal special issues and as a chair of ten special sessions and workshops. He is a Program Committee member for over 40 international conferences and a reviewer for 30 journals.

Mohamed M. Gaber

School of Computing and Digital Technology, Birmingham City University, UK

Mohamed M. Gaber is a Professor in Data Analytics and the ``Data Science & Big Data Analytics'' research group leader at the School of Computing and Digital Technology, Birmingham City University. Mohamed received his PhD from Monash University, Australia. He has published over 150 papers, co-authored 2 monograph-style books, and edited/co-edited 6 books on data mining and knowledge discovery. His research interests include data stream mining, ensemble learning, and mobile data mining. Mohamed has served in the program committees of major conferences related to data mining, including ICDM, PAKDD, ECML/PKDD and ICML. He has also co-chaired numerous scientific events on various data mining topics. He is also a member of the International Panel of Expert Advisers for the Australasian Data Mining Conferences. In 2007, he was awarded the CSIRO teamwork award.

João Gama

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Portugal

João Gama received his Ph.D. in Computer Science in 2000. He is a senior researcher at INESC TEC. He has worked in several National and European projects on Incremental and Adaptive learning systems, Ubiquitous Knowledge Discovery, Learning from Massive, and Structured Data, etc. He served as Program chair at several Machine Learning and Data Mining conferences. He is author of a monography on Knowledge Discovery from Data Streams and more than 200 peer-reviewed papers in areas related to machine learning, data mining, and data streams.

Edwin Lughofer

Department of Knowledge-Based Mathematical Systems, Johannes Kepler University, Austria

Edwin Lughofer received his PhD-degree from the Johannes Kepler University Linz (JKU) in 2005. He is currently Key Researcher with the Fuzzy Logic Laboratorium Linz / Department of Knowledge-Based Mathematical Systems (JKU) in the Softwarepark Hagenberg, see www.flll.jku.at/staff/edwin/. He has participated in several basic and applied research projects on European and national level, with a specific focus on topics of Industry 4.0 and FoF (Factories of the Future). He has published around 170 publications in the fields of evolving fuzzy systems, machine learning and vision, data stream mining, chemometrics, active learning, classification and clustering, fault detection and diagnosis, quality control, predictive maintenance, including 60 journals papers in SCI-expanded impact journals, a monograph on ’Evolving Fuzzy Systems’ (Springer) and an edited book on ’Learning in Non-stationary Environments’ (Springer). In sum, his publications received 2900 references achieving an h-index of 33. He is associate editor of the international journals IEEE Transactions on Fuzzy Systems, Evolving Systems, Information Fusion, Soft Computing and Complex and Intelligent Systems, the general chair of the IEEE Conference on EAIS 2014 in Linz, the publication chair of IEEE EAIS 2015, 2016 and 2017, and the Area chair of the FUZZ-IEEE 2015 conference in Istanbul. He co-organized around 20 special issues and special sessions in international journals and conferences. In 2006 he received the best paper award at the International Symposium on Evolving Fuzzy Systems, in 2013 the best paper award at the IFAC conference in Manufacturing Modeling, Management and Control (800 participants) and in 2016 the best paper award at the IEEE Intelligent Systems Conference.

Program Committee