preview

What Is Web Proxy Log Data And Preprocessing

Decent Essays

of the Web is a prime target for librarians to tackle.

3. PROPOSED WORK
In the existing works the performance of the servers is improved by pre-fetching the likely pages and then caching them in the server. The existing works try to cluster the data based on the user interests or the time taken by the server to respond back to the requests. In this proposed work improvement of the performance is achieved by clustering the users in different group based on their location from which the request is sent. Clustering the users based on the location improves the hit ratio. The web log file provides all the data about the user such as user name, IP address, Time Stamp, Access Request, number of Bytes

3.1 Web Proxy Log Data and Preprocessing …show more content…

When all the items of the data set are assigned to one of the centroid the first stage is completed and an early set of clusters is obtained. After the first stage we recalculate to find the new centroids and then again find the distances between the data set entities and the centroids. The same process is iterated till the centroids become stable and there are no more changes in it. The K-Means algorithm is fast robust and easier to understand compared to the other clustering algorithms. Also it provides better results when the data items are well separated or distinct from each other.
In this study, the K-Means algorithm is used to group the web data into different clusters based on the location of the web users which is obtained from the IP addresses. The work assumes to separate the users based on the location from where the request is being generated. After obtaining the clusters, the algorithm to generate the association rules is applied.

3.4 Pattern Discovery using FP- Growth Algorithm
The frequently occurring patterns in the data set are known as the frequent patterns. For instance, a subset of items from the data set such as bread and butter appearing frequently in the transactions can be called as a frequent item set. A web log file also provides a lot of information about the web users and their behavior. Association rule is the widely used data mining technique which can be applied to the web data as well to discover frequent

Get Access