Proceeding of

1st Int. Conf. on Recent Trends & Research in Engineering and Science

(ICRTRES-2015)

21-23 March, 2015

 

Organized By

Padm. Dr. V. B. Kolte College of Engineering & Polytechnic, Malkapur

 

as

A Special Issue of

International Journal of Computer Science and Applications

(ISSN:0974-1011)

 

Advisory Committee

Dr. G. R. Bamnote

(Dean, Faculty of Engg, SGBAU Amravati)

 

Dr. B. E. Narkhede

(Vice President IIIE, Mumbai)

 

Dr. Md. Mamun Habib

(University Utara Malaysia (UUM, Malaysia)

 

Dr. C. R. Patil

(Prof. PRMIT&R Badnera, Amravati)

 

Dr. T. R. Deshmukh

(Prof. PRMIT&R Badnera, Amravati)

 

Dr. W. Z. Gandhare

(Principal, Govt. College of Engineering Amravati)

 

Dr. D. N. Kyatanavar

(Principal, SRESCOE Kopargaon)

 

Dr. U. Pendharkar

(Professor, Government Engineering College, Ujjain)

 

Dr. Ajit Thete

(Director, Centre for Development of Leadership in    Education Pvt Ltd, Aurangabad)

 

Technical Committee

Dr. M. T. Datar

Dr. S. K. Garg

Dr. Shrikaant Kulkarni

Shri. D. N. Patil

Dr. A. W. Kolte

Prof. Ajitabh Pateriya

Prof. P. K. Patil

Prof. S.N. Khachane

Prof. Parag Chourey

Prof. B.K.Chaudhari

Prof. N.A. Kharche

Prof. R.M. Choudhari

Prof. R. B. Pandhare

Prof. A.P. Jadhao

Prof. S.B. Jadhav

Prof. Santosh Raikar

Prof. Y.P. Sushir

Prof. B. M. Tayde

 

Editor

Prof. K. H. Walse

 

Research Publications, India

 

 

   
 
 
 
IJCSA ISSN: 0974-1011 (Online) >>    
Title:

Data Mining With Big Data Image De-Duplication In Social Networking Websites

Author:

Hashmi S.Taslim

 

Abstract

Big data is the term for a collection of data sets which are large and complex. Data comes from everywhere, sensor used to gather climate information, posts to social media sites and video. This data is known as Big data. useful data can be extracted from this Big data with the help of data mining we propose an efficient approach based on the search of closed patterns. Moreover, we present a novel way to encode the bag-of-words image representation into data mining transactions. We validate our approach on a new dataset of one million Internet images obtained with random searches on Google image search. Using the proposed method, we find more than 80 thousands groups of duplicates among the one million images in less than three minutes while using only 150 Megabytes of memory. Unlike other existing approaches, our method can scale gracefully to larger datasets as it has linear time and space (memory) complexities.We propose an efficient way of storing and De-Duplication of images on server of On-line Social Networks. In this approach server will maintain only one copy of image on server and provides access to all users who have uploaded it. This is achieved through a flexible rule-based system that allows users to upload images on server, and in background before storing image on server it will check whether any duplicate image is exist or not, if image is already available then it will not upload this image, instead of it server will tag this user with old image. we focus primarily the method requires only a small amount of data need be  stored. we demonstrate our method on the Trec 2006  data set which contain approximately 146k key frames. The proposed method uses  a Visual vocabulary of vector quantized local feature descriptor (SURF) and for retrieval exploits enhanced min hash techniques. The algorithm select min-hash algorithm



2015 International Journal of Computer Science and Applications 

Published by Research Publications, India