Privacy Preserving In Data Stream Using Sliding Window Method
1. INTRODUCTION:
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data. There is a huge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyse this huge amount of data and extract useful information from it. Extraction of information is not the only process we need to perform; data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation [12].
Information is today probably the
…show more content…
2.1 Anonymization based PPDM
The basic form of the data in a table consists of following four types of attributes:
(i) Explicit Identifiers is a set of attributes containing information that identifies a record owner explicitly such as name, SS number etc.
(ii) Quasi Identifiers is a set of attributes that could potentially identify a record owner when combined with publicly available data.
(iii) Sensitive Attributes is a set of attributes that contains sensitive person specific information such as disease, salary etc.
(iv) Non-Sensitive Attributes is a set of attributes that creates no problem if revealed even to untrustworthy parties [4].
Anonymization refers to an approach where identity or/and sensitive data about record owners are to be hidden. It even assumes that sensitive data should be retained for analysis. It 's obvious that explicit identifiers should be removed but still there is a danger of privacy intrusion when quasi identifiers are linked to publicly available data. Such attacks are called as linking attacks. For example attributes such as DOB, Sex, Race, and Zip are available in public records such as voter list.
Figure 1: Linking Attack
Such records are available in medical records also, when linked, can be used to infer
An attribute is an arrangement that defines the property of an object, folder, or file. Attributes should more correctly be considered metadata because it is the settings of the specified object, folder or file.
Private Information - Personal information about an individual for which the individual can reasonably expect will not be made available
3. Using your answer to Question 2, designate which attribute or attributes form the identifier for each entity type. Explain why you chose each identifier.
Metadata present a more complete picture of the data in the database than the data itself.
Attributes are used inside a table to display which ones linked to the primary key for instance if the customer ID was a primary key the one of the attributes could be customer name or in relation to sports type of sport.
In addition, there are more personal data being collected as the cost of information technology falls. Although, collecting such data undeniably provides economic benefits, it has proved impossible to keep data completely protected against criminal misuse (Roberds and
▪ The attribute that is used to derive the attribute is called a stored attribute.
Confidentiality: Ensures that data or an information system is accessed by only an authorized person (Techopedia, n.d.).
Attribute Data - Data which on one of a set of discrete values such as pass or fail, yes or no.
If the primary key of a child entity contains all the attributes in a foreign key, the child entity is said to be "identifier dependent" on the parent entity, and the relationship is called an "identifying relationship." If any attributes in a foreign key do not belong to the child 's primary key,
Identification is the row that defines the basic and grand view of what that project or vision will be in the end of last row of the framework. This part starts and states the grand scope of what will happen or what the end product will be. This row in the framework sets a starting point that the other part build off of.
Access control mechanisms protect sensitive information from unauthorized users. However, when sensitive information is shared and a Privacy Protection Mechanism (PPM) is not in place, an authorized user can still compromise the privacy of a person leading to identity disclosure. A PPM can use suppression and generalization of relational data to anonymize and satisfy privacy requirements, e.g., k-anonymity and l-diversity, against identity and attribute disclosure. However, privacy is achieved at the cost of precision of authorized information., we propose an accuracy-constrained privacy-preserving access control framework. The access control policies define selection predicates available to roles while the privacy requirement is to satisfy the k-anonymity or l-diversity. An additional constraint that needs to be satisfied by the PPM is the imprecision bound for each selection predicate. The techniques for workload-aware anonymization for selection predicates have been discussed in the literature. However, to the best of our knowledge, the problem of satisfying the accuracy constraints for multiple roles has not been studied before. In our formulation of the aforementioned problem, we propose heuristics for anonymization algorithms and show empirically that the proposed approach satisfies imprecision bounds for more permissions and has lower total imprecision than the current state of the art.
Attribute: is a quality or object that we attribute to someone or something. For example, the scepter is an attribute of power and statehood.
Information that is made accessible by the public unless intended. To reduce the risk of
#17 organizationally unique identifier- the first 6 digits of an address which are used to indicate the vendor of the network interface.