6th IEEE International Conference on Big Data Security on Cloud (BigDataSecurity 2020)
Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems
May 26, 2020
A key challenge for natural language based large text data is automatically extracting knowledge, in terms of entities and relations, embedded in it. State of the art relation extraction systems requires large amounts of labeled data, which is costly and very difficult, especially in industrial settings, due to time constraints of subject matter experts. Techniques like distant supervision require the availability of a related knowledge base, which is rarely possible. We have developed a novel model for automatically clustering textual Big Data, based on techniques inspired from Active Learning and Clustering, that can derive powerful insights and make the data ready for machine learning with minimal manual effort. Our approach differs from Active Learning as we operate under weak supervision, where all the instances provided for training are not manually labeled. Secondly, This differs from any prevailing clustering algorithms as we adopt a whole new approach of proximity clustering based on affinity propagation. Due to the extrapolation of the labeling efforts, our model makes it easier to adopt deep learning approaches with minimal manual effort. In this paper, we describe our algorithm in detail, along with the experimental results obtained for them.
Downloads: 190 downloads