Saturday, January 11, 2020
New Hoarding Technique for Handling Disconnection in Mobile
Literature Survey On New Hoarding Technique for Handling Disconnection in Mobile Submitted by Mayur Rajesh Bajaj (IWC2011021) In Partial fulfilment for the award of the degree Of Master of Technology In INFORMATION TECHNOLOGY (Specialization: Wireless Communication and Computing) [pic] Under the Guidance of Dr. Manish Kumar INDIAN INSTITUTE OF INFORMATION TECHNOLOGY, ALLAHABAD (A University Established under sec. 3 of UGC Act, 1956 vide Notification no. F. 9-4/99-U. 3 Dated 04. 08. 2000 of the Govt. of India) (A Centre of Excellence in Information Technology Established by Govt. of India) Table of Contents [pic] 1.Introductionâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦. 3 2. Related Work and Motivation 1. Coda: The Pioneering System for Hoardingâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦. 4 2. Hoarding Based on Data Mining Techniquesâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦.. 5 3. Hoarding Techniques Based on Program Treesâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦.. 8 4. Hoarding in a Distributed Environmentâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦. 9 5.Hoarding content for mobile learningâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦ 10 6. Mobile Clients Through Cooperative Hoardingâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦.. 10 7. Comparative Discussion previous techniquesâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦. 11 3. Problem Definitionâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦. 11 4. New Approach Suggested 1. Zipfââ¬â¢s Law â⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦.. 2 2. Object Hotspot Prediction Modelâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢ ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦ 13 5. Schedule of Workâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦. 13 6. Conclusionâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦ 13 Referencesâ⬠¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦Ã¢â¬ ¦ 14 . Introduction Mobile devices are the computers which are having wireless communication capabilities to access global data services from any location while roaming. Now a dayââ¬â¢s mobile devices are supporting applications such as multimedia, World Wide Web and other high profile applications which demands continuous connections and Mobile devices are lacking here. However, mobile devices with wireless communication are frequently disconnected from the network due to the cost of wireless communication or the unavailability of the wireless network.Disconnection period of mobile device from its network is called as offline period. Such offline periods may appear for different reasons ââ¬â intentional (e. g. , the available connection is too expensive for the user) or unintentional (e. g. , lack of infrastructure at a given time and location). During offline periods the user can only access materials located on the deviceââ¬â¢s local memory. Mobile systems typically have a relatively small amount of memory, which is often not enough to store all the needed data for ongoing activities to continue.In such a case, a decision should be taken on which part of the data has to be cached. Often we cannot count on the userââ¬â¢s own judgement of what he/she will need and prefetch. Rather, in our opinion, some sort of automatic prefetching would be desirable. Uninterrupted operation in offline mode will be in high demand and the mobile computer systems should provide support for it. Seamless disconnection can be achieved by loading the files that a user will access in the future from the network to the local storage. This preparation process for disconnected operation is called hoarding.Few of the parameters which complicate the hoarding process are prediction of future access pattern of the user, handling of hoard miss, limited local hoard memory and unpredictable disconnections and reconnection, activities on hoarded object at other clients, the asymmetry of communications bandwidth in downstream and upstream. An important point is to measure the quality of the hoarding and to try to improve it continuously. An often used metric in the evaluation of caching proxies is the hit ratio. Hit ratio is calculated by dividing the number of by the total number of uploaded predictions.It is a good measure for hoarding systems, though a better measure is the miss ratio ââ¬â a percentage of accesses for which the cache is ineffective. In this work we have given brief overview of the techniques proposed in earlier days and also given the idea for the new hoarding technique. 2. Related Work and Motivation Before the early 1990ââ¬â¢s, there was little research on hoarding. Since then, however, interest has increased dramatically among research scientists and professors around the globe and many techniques have been developed. Here we have listed few of the techniques and also will discuss them in brief. Coda: The Pioneering System for Hoarding â⬠¢ Hoarding Based on Data Mining Techniques ? SEER Hoarding System (inspired by clustering technique) ? Association Rule-Based Techniques ? Hoarding Based on Hyper Graph ? Probability Graph Based Technique â⬠¢ Hoarding Techniques Based on Program Trees â⬠¢ Hoarding in a Distributed Environment â⬠¢ Hoarding content for mobile learning â⬠¢ Mobile Clients Through Cooperative Hoarding 2. 1 Coda Coda is a distributed file system based on clientââ¬âserver architecture, where there are many clients and a comparatively smaller number of servers.It is the first system that enabled users to work in disconnected mode. The concept of hoarding was introduced by the Coda group as a means of enabling disconnected operation. Disconnections in Coda are assumed to occur involuntarily due to network failures or voluntarily due to the detachment of a mobile client from the network. Voluntary and involuntary disconnections are handled the same way. The cache manager of Coda, called Venus, is designed to work in disconnected mode by serving client requests from the cache when the mobile client is detached from the network.Requests to the files that are not in the cache during disconnection are reflected to the client as failures. The hoarding system of Coda lets users select the files that they will hopefully need in the future. This information is used to decide what to load to the local storage. For disconnected operation, files are loaded to the client local storage, because the master copies are kept at stationary servers, there is the notion of replication and how to manage locks on the local copies. When the disconnection is voluntary, Coda handles this case by obtaining exclusive locks to files.However in case of involuntary disconnection, the system should defer the conflicting lock requests for an object to the reconnection time, which may not be predictable. The cache management system of Coda, called Venus, diff ers from the previous ones in that it incorporates user profiles in addition to the recent reference history. Each workstation maintains a list of pathnames, called the hoard database. These pathnames specify objects of interest to the user at the workstation that maintains the hoard database. Users can modify the hoard database via scripts, which are called hoard profiles.Multiple hoard profiles can be defined by the same user and a combination of these profiles can be used to modify the hoard database. Venus provides the user with an option to specify two time points during which all file references will be recorded. Due to the limitations of the mobile cache space, users can also specify priorities to provide the hoarding system with hints about the importance of file objects. Precedence is given to high priority objects during hoarding where the priority of an object is a combination of the user specified priority and a parameter indicating how recently it was accessed.Venus per forms a hierarchical cache management, which means that a directory is not purged unless all the subdirectories are already purged. In summary, the Coda hoarding mechanism is based on a least recently used (LRU) policy plus the user specified profiles to update the hoard data-base, which is used for cache management. It relies on user intervention to determine what to hoard in addition to the objects already maintained by the cache management system. In that respect, it can be classified as semi-automated.Researchers developed more advanced techniques with the aim of minimizing the user intervention in determining the set of objects to be hoarded. These techniques will be discussed in the following sections. 2. 2 Hoarding based on Data mining Techniques Knowing the interested pattern from the large collection of data is the basis of data mining. In the earlier history of hoarding related works researchers have applied many different data mining techniques in this arena of mobile hoa rding. Mainly clustering and association rule mining techniques were adopted from data mining domain. . 2. 1 SEER Hoarding System To automate the hoarding process, author developed a hoarding system called SEER that can make hoarding decisions without user intervention. The basic idea in SEER is to organize usersââ¬â¢ activities as projects in order to provide more accurate hoarding decisions. A distance measure needs to be defined in order to apply clustering algorithms to group related files. SEER uses the notion of semantic distance based on the file reference behaviour of the files for which semantic distance needs to be calculated.Once the semantic distance between pairs of files are calculated, a standard clustering algorithm is used to partition the files into clusters. The developers of SEER also employ some filters based on the file type and other conventions introduced by the specific file system they assumed. The basic architecture of the SEER predictive hoarding syste m is provided in figure 1. The observer monitors user behaviour (i. e. , which files are accessed at what time) and feeds the cleaned and formatted access paths to the correlator, which then generates the distances among files in terms of user access behaviour.The distances are called the semantic distance and they are fed to the cluster generator that groups the objects with respect to their distances. The aim of clustering is, given a set of objects and a similarity or distance matrix that describes the pairwise distances or similarities among a set of objects, to group the objects that are close to each other or similar to each other. Calculation of the distances between files is done by looking at the high-level file references, such as open or status inquiry, as opposed to individual reads and writes, which are claimed to obscure the process of distance calculation. pic] Figure 1. Architecture of the SEER Predictive Hoarding System The semantic distance between two file referen ces is based on the number of intervening references to other files in between these two file references. This definition is further enhanced by the notion of lifetime semantic distance. Lifetime semantic distance between an open file A and an open file B is the number of intervening file opens (including the open of B). If the file A is closed before B is opened, then the distance is defined to be zero.The lifetime semantic distance relates two references to different files; however it needs to be somehow converted to a distance measure between two files instead of file references. Geometric mean of the file references is calculated to obtain the distance between the two files. Keeping all pairwise distances takes a lot of space. Therefore, only the distances among the closest files are represented (closest is determined by a parameter K, K closest pairs for each file are considered). The developers of SEER used a variation of an agglomerative (i. e. bottom up) clustering algorithm called k nearest neighbour, which has a low time and space complexity. An agglomerative clustering algorithm first considers individual objects as clusters and tries to combine them to form larger clusters until all the objects are grouped into one single cluster. The algorithm they used is based on merging sub clusters into larger clusters if they share at least kn neighbours. If the two files share less than kn close files but more than kf, then the files in the clusters are replicated to form overlapping clusters instead of being merged.SEER works on top of a user level replication system such as Coda and leaves the hoarding process to the underlying file system after providing the hoard database. The files that are in the same project as the file that is currently in use are included to the set of files to be hoarded. During disconnected operation, hoard misses are calculated to give a feedback to the system. 2. 2. 2 Association Rule-Based Techniques Association rule overview: Let I=i1,i2â⬠¦.. im be a set of literals, called items and D be a set of transactions, such that ?T ? D; T? I. A transaction T contains a set of items X if X? T. An association rule is denoted by an implication of the form X ? Y, where X? I, Y ? I, and X ? Y = NULL. A rule X ? Y is said to hold in the transaction set D with confidence c if c% of the transactions in D that contain X also contain Y. The rule X? Y has support sin the transaction set D if s% of transactions in D contains X? Y. The problem of mining association rules is to find all the association rules that have a support and a confidence greater than user-specified thresholds.The thresholds for confidence and support are called minconf and minsup respectively. In Association Rule Based Technique for hoarding, authors described an application independent and generic technique for determining what should be hoarded prior to disconnection. This method utilizes association rules that are extracted by data mining techni ques for determining the set of items that should be hoarded to a mobile computer prior to disconnection. The proposed method was implemented and tested on synthetic data to estimate its effectiveness.The process of automated hoarding via association rules can be summarized as follows: Step 1: Requests of the client in the current session are used through an inferencing mechanism to construct the candidate set prior to disconnection. Step 2: Candidate set is pruned to form the hoard set. Step 3: Hoard set is loaded to the client cache. The need to have separate steps for constructing the candidate set and the hoard set arises from the fact that users also move from one machine to another that may have lower resources.The construction of the hoard set must adapt to such potential changes. Construction of candidate set: An inferencing mechanism is used to construct the candidate set of data items that are of interest to the client to be disconnected. The candidate set of the client is constructed in two steps; 1. The inferencing mechanism finds the association rules whose heads (i. e. , left hand side) match with the clientââ¬â¢s requests in the current session, 2. The tails (i. e. , right hand side) of the matching rules are collected into the candidate set.Construction of Hoard set: The client that issued the hoard request has limited re-sources. The storage resource is of particular importance for hoarding since we have a limited space to load the candidate set. Therefore, the candidate set obtained in the first phase of the hoarding set should shrink to the hoard set so that it fits the client cache. Each data item in the candidate set is associated with a priority. These priorities together with various heuristics must be incorporated for determining the hoard set. The data items are used to sort the rules in descending order of priorities.The hoard set is constructed out of the data items with the highest priority in the candidate set just enough to fil l the cache. 3. Hoarding Based on Hyper Graph Hyper graph based approach presents a kind of low-cost automatic data hoarding technology based on rules and hyper graph model. It first uses data mining technology to extract sequence relevance rules of data from the broadcasting history, and then formulates hyper graph model, sorting the data into clusters through hyper graph partitioning methods and sorting them topologically.Finally, according to the data invalid window and the current visit record, data in corresponding clusters will be collected. Hyper graph model: Hyper graph model is defined as H = (V, E) where V={v1 ,v2 ,â⬠¦ ,vn } is the vertices collection of hyper graph, and E={e1 ,e2 ,â⬠¦ ,em } is super-edge collection of hyper graph (there supposed to be m super-edges in total). Hyper graph is an extension of graph, in which each super-edge can be connected with two or more vertices. Super-edge is the collection of a group of vertices in hyper graph, and superedge ei = {vi1, vi2, â⬠¦ inj} in which vi1,vi2 ,â⬠¦ ,vin ? V . In this model, vertices collection V corresponds to the history of broadcast data, in which each point corresponds to a broadcast data item, and each super-edge corresponds to a sequence model. Sequence model shows the orders of data items. A sequence model in size K can be expressed as p = . Use of hyper graph in hoarding are discussed in paper in details. 4. Probability Graph Based Technique This paper proposed a low-cost automated hoarding for mobile computing.Advantage of this approach is it does not explore application specific heuristics, such as the directory structure or file extension. The property of application independence makes this algorithm applicable to any predicative caching system to address data hoarding. The most distinguished feature of this algorithm is that it uses probability graph to represent data relationships and to update it at the same time when userââ¬â¢s request is processed. Before d isconnection, the cluster algorithm divides data into groups.Then, those groups with the highest priority are selected into hoard set until the cache is filled up. Analysis shows that the overhead of this algorithm is much lower than previous algorithms. Probability Graph: An important parameter used to construct probability graph is look-ahead period. It is a fixed number of file references that defines what it means for one file to be opened ââ¬Ësoonââ¬â¢ after another. In other words, for a specific file reference, only references within the look-ahead period are considered related. In fact, look-ahead period is an approximate method to avoid traversing the whole trace.Unlike constructing probability graph from local file systems, in the context of mobile data access, data set is dynamically collected from remote data requests. Thus, we implemented a variation of algorithm used to construct probability graph, as illustrated in Figure 2. [pic] Figure 2. Constructing the prob ability graph The basic idea is simple: If a reference to data object A follows the reference to data object B within the look-ahead period, then the weight of directed arc from B to A is added by one. The look-ahead period affects absolute weight of arcs.Larger look-ahead period produces more arcs and larger weight. A ââ¬â¢s dependency to B is represented by the ratio of weight of arc from B to A divided by the total weight of arcs leaving B. Clustering: Before constructing the final hoard set, data objects are clustered into groups based on dependency among data objects. The main objective of the clustering phase is to guarantee closely related data objects are partitioned into the same group. In the successive selecting phase, data objects are selected into hoard set at the unit of group. This design provides more continuity in user operation when disconnected.Selecting Groups: The following four kinds of heuristic information are applicable for calculating priority for a grou p: â⬠¢ Total access time of all data objects; â⬠¢ Average access time of data objects; â⬠¢ Access time of the start data object; â⬠¢ Average access time per byte. 2. Hoarding Techniques Based on Program Trees A hoarding tool based on program execution trees was developed by author running under OS/2 operating system. Their method is based on analyzing program executions to construct a profile for each program depending on the files the program accesses.They proposed a solution to the hoarding problem in case of informed disconnections: the user tells the mobile computer that there is an imminent disconnection to fill the cache intelligently so that the files that will be used in the future are already there in the cache when needed. [pic] Figure 3. Sample program Tree This hoarding mechanism lets the user make the hoarding decision. They present the hoarding options to the user through a graphical user interface and working sets of applications are captured automatic ally. The working sets are detected by logging the user file accesses at the background.During hoarding, this log is analyzed and trees that represent the program executions are constructed. A node denotes a file and a link from a parent to one of its child nodes tells us that either the child is opened by the parent or it is executed by the parent. Roots of the trees are the initial processes. Program trees are constructed for each execution of a program, which captures multiple contexts of executions of the same program. This has the advantage that the whole context is captured from different execution times of the program.Finally, hoarding is performed by taking the union of all the execution trees of a running program. A sample program tree is provided in Figure 3. Due to the storage limitations of mobile computers, the number of trees that can be stored for a program is limited to 15 LRU program trees. Hoarding through program trees can be thought of as a generalization of a pr o-gram execution by looking at the past behaviour. The hoarding mechanism is enhanced by letting the user rule out the data files. Data files are automatically detected using three complementary heuristics: 1.Looking at the filename extensions and observing the filename conventions in OS/2, files can be distinguished as executable, batch files, or data files. 2. Directory inferencing is used as a spatial locality heuristic. The files that differ in the top level directory in their pathnames from the running program are assumed to be data files, but the programs in the same top level directory are assumed to be part of the same program. 3. Modification times of the files are used as the final heuristic to deter-mine the type of a file. Data files are assumed to be modified more recently and frequently than the executables.They devised a parametric model for evaluation, which is based on recency and frequency. 3. Hoarding in a Distributed Environment Another hoarding mechanism, which was presented for specific application in distributed system, assumes a specific architecture, such as infostations where mobile users are connected to the network via wireless local area networks (LANs) that offer a high bandwidth, which is a cheaper option compared to wireless wide area networks (WANs). The hoarding process is handed over to the infostations in that model and it is assumed that what the user wants to access is location-dependent.Hoarding is proposed to fill the gap between the capacity and cost trade-off between wireless WANS and wireless LANs. The infestations do the hoarding and when a request is not found in the infostation, then WAN will be used to get the data item. The hoarding decision is based on the user access patterns coupled with that userââ¬â¢s location information. Items frequently accessed by mobile users are recorded together with spatial information (i. e. , where they were accessed). A region is divided into hoarding areas and each infostation is responsible with one hoarding area. 4. Hoarding content for mobile learningHoarding in the learning context is the process for automatically choosing what part of the overall learning content should be prepared and made available for the next offline period of a learner equipped with a mobile device. We can split the hoarding process into few steps that we will discuss further in more details: 1. Predict the entry point of the current user for his/her next offline learning session. We call it the ââ¬Ëstarting pointââ¬â¢. 2. Create a ââ¬Ëcandidate for cachingââ¬â¢ set. This set should contain related documents (objects) that the user might access from the starting point we have selected. 3.Prune the set ââ¬â the objects that probably will not be needed by the user should be excluded from the candidate set, thus making it smaller. This should be done based on user behaviour observations and domain knowledge. 4. Find the priority to all objects still in the hoarding set after pruning. Using all the knowledge available about the user and the current learning domain, every object left in the hoarding set should be assigned a priority value. The priority should mean how important the object is for the next user session and should be higher if we suppose that there is a higher probability that an object will be used sooner. . Sort the objects based on their priority, and produce an ordered list of objects. 6. Cache, starting from the beginning of the list (thus putting in the device cache those objects with higher priority) and continue with the ones with smaller weights until available memory is filled in. 5. Mobile Clients Through Cooperative Hoarding Recent research has shown that mobile users often move in groups. Cooperative hoarding takes advantage of the fact that even when disconnected from the network, clients may still be able to communicate with each other in ad-hoc mode.By performing hoarding cooperatively, clients can share their hoar d content during disconnections to achieve higher data accessibility and reduce the risk of critical cache misses. Two cooperative hoarding schemes, GGH and CAP, have been proposed. GGH improves hoard performance by al-lowing clients to take advantage of what their peers have hoarded when making their own hoarding decisions. On the other hand, CAP selects the best client in the group to Hoard each object to maximise the number of unique objects hoarded and minimise access cost. Simulation results show that compare to existing schemes.Details of GGH and CAP are given in paper. 2. 7 Comparative Discussion previous techniques The hoarding techniques discussed above vary depending on the target system and it is difficult to make an objective comparative evaluation of their effectiveness. We can classify the hoarding techniques as being auto-mated or not. In that respect, being the initial hoarding system, Coda is semiautomated and it needs human intervention for the hoarding decision. T he rest of the hoarding techniques discussed are fully automated; how-ever, user supervision is always desirable to give a final touch to the files to be hoarded.Among the automated hoarding techniques, SEER and program tree-based ones assume a specific operating system and use semantic information about the files, such as the naming conventions, or file reference types and so on to construct the hoard set. However, the ones based on association rule mining and infostation environment do not make any operating system specific assumptions. Therefore, they can be used in generic systems. Coda handles both voluntary and involuntary disconnections well.The infostation-based hoarding approach is also inherently designed for involuntary disconnections, because hoarding is done during the user passing in the range of the infostation area. However, the time of disconnection can be predicted with a certain error bound by considering the direction and the speed of the moving client predicting when the user will go out of range. The program tree-based methods are specifically designed for previously informed disconnections. The scenario assumed in the case of infostations is a distributed wire-less infrastructure, which makes it unique among the hoarding mechanisms.This case is especially important in todayââ¬â¢s world where peer-to-peer systems are becoming more and more popular. 3. Problem Definition The New Technique that we have planned to design for hoarding will be used on Mobile Network. Goals that we have set are a. Finding a solution having optimal hit ratio in the hoard at local node. b. Technique should not have greater time complexity because we donââ¬â¢t have much time for performing hoarding operation after the knowledge of disconnection. c. Optimal utilization of hoard memory. d. Support for both intentional and unintentional disconnection. e.Proper handling of conflicts in hoarded objects upon reconnection. However, our priority will be for hit rati o than the other goals that we have set. We will take certain assumptions about for other issues if we find any scope of improvement in hit ratio. 4. New Approach 4. 1 Zipfââ¬â¢s Law It is a mathematical tool to describe the relationship between words in a text and their frequencies. Considering a long text and assigning ranks to all words by the frequencies in this text, the occurrence probability P (i) of the word with rank i satisfies the formula below, which is known as Zipf first law, where C is a constant.P (i) = [pic] â⬠¦. (1) This formula is further extended into a more generalized form, known as Zipf-like law. P (i) = [pic]â⬠¦. (2) Obviously, [pic]â⬠¦. (3) Now According to (2) and (3), we have C[pic] [pic] Our work is to dynamically calculate for different streams and then according to above Formula (2) and (4), the hotspot can be predicted based on the ranking of an object. 4. 2 Object Hotspot Prediction Model 4. 2. 1 Hotspot Classification We classify hotsp ot into two categories: ââ¬Å"permanent hotspotâ⬠and ââ¬Å"stage hotspotâ⬠. Permanent hotspot is an object which is frequently accessed regularly.Stage hotspot can be further divided into two types: ââ¬Å"cyclical hotspotâ⬠and ââ¬Å"sudden hotspotâ⬠. Cyclical hotspot is an object which becomes popular periodically. If an object is considered as a focus suddenly, it is a sudden hotspot. 4. 2. 2. Hotspot Identification Hotspots in distributed stream-processing storage systems can be identified via a ranking policy (sorted by access frequencies of objects). In our design, the hotspot objects will be inserted into a hotspot queue. The maximum queue length is determined by the cache size and the average size of hotspot Objects.If an objectââ¬â¢s rank is smaller than the maximum hotspot queue length (in this case, the rank is high), it will be considered as ââ¬Å"hotspotâ⬠in our system. Otherwise it will be considered as ââ¬Å"non hotspotâ⬠. And t he objects in the queue will be handled by hotspot cache strategy. 4. 2. 3 Hotspot Prediction This is our main section of interest, here we will try to determine the prediction model for hoard content with optimal hoard hit ratio. 5. Schedule of Work |Work |Scheduled Period |Remarks | |Studying revious work on Hoarding |July ââ¬â Aug 2012 |Complete | |Identifying Problem |Sept 2012 |Complete | |Innovating New Approach |Oct 2012 |Ongoing | |Integrating with Mobile Arena as solution to Hoarding |Nov- Dec 2012 |- | |Simulation And Testing |Jan 2013 |- | |Optimization |Feb 2013 |- | |Simulation And Testing |Mar 2013 |- | |Writing Thesis Work / Journal Publication |Apr ââ¬âMay 2013 |- | 6. Conclusion In this literature survey we have discussed previous related work on hoarding. We have also given the requirements for the new technique that is planned to be design.Also we are suggesting a new approach that is coming under the category of Hoarding with Data Mining Techniques. Recen t studies have shown that the use of proposed technique i. e. Zipfs-Like law for caching over the web contents have improved the hit ratio to a greater extent. Here with this work we are expecting improvements in hit ratio of the local hoard. References [1]. James J. Kistler and Mahadev Satyanarayanan. Disconnected Operation in the Coda File System. ACM Transactions on Computer Systems, vol. 10, no. 1, pp. 3ââ¬â25, 1992. [2]. Mahadev Satyanarayanan. The Evolution of Coda. ACM Transactions on Computer Systems, vol. 20, no. 2, pp. 85ââ¬â124, 2002 [3]. Geoffrey H. Kuenning and Gerald J. Popek. Automated Hoarding for Mobile Computers.In Proceedings of the 16th ACM Symposium on Operating System Principles (SOSP 1997), October 5ââ¬â8, St. Malo, France, pp. 264ââ¬â275, 1997. [4]. Yucel Saygin, Ozgur Ulusoy, and Ahmed K. Elmagarmid. Association Rules for Supporting Hoarding in Mobile Computing Environments. In Proceedings of the 10th IEEE Workshop on Research Issues in Data Engineering (RIDE 2000), February 28ââ¬â29, San Diego, pp. 71ââ¬â78, 2000. [5]. Rakesh Agrawal and Ramakrishna Srikant, Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Databases, Chile, 1994. [6]. GUO Peng, Hu Hui, Liu Cheng. The Research of Automatic Data Hoarding Technique Based on Hyper Graph.Information Science and Engineering (ICISE), 1st International Conference, 2009. [7]. Huan Zhou, Yulin Feng, Jing Li. Probability graph based data hoarding for mobile environment. Presented at Information & Software Technology, pp. 35-41, 2003. [8]. Carl Tait, Hui Lei, Swarup Acharya, and Henry Chang. Intelligent File Hoarding for Mobile Computers. In Proceedings of the 1st Annual International Conference on Mobile Computing and Networking (MOBICOMââ¬â¢95), Berkeley, CA, 1995. [9]. Anna Trifonova and Marco Ronchetti. Hoarding content for mobile learning. Journal International Journal of Mobile Communications archive V olume 4 Issue 4, Pages 459-476, 2006. [10]. Kwong Yuen Lai, Zahir Tari, Peter Bertok.Improving Data Accessibility for Mobile Clients through Cooperative Hoarding. Data Engineering, ICDE proceedings 21st international Conference 2005. [11]. G. Zipf, Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949. [12]. Chentao Wu, Xubin He, Shenggang Wan, Qiang Cao and Changsheng Xie. Hotspot Prediction and Cache in Distributed Stream-processing Storage Systems. Performance Computing and Communications Conference (IPCCC) IEEE 28th International, 2009. [13]. Lei Shi, Zhimin Gu, Lin Wei and Yun Shi. An Applicative Study of Zipfââ¬â¢s Law on Web Cache International Journal of Information Technology Vol. 12 No. 4 2006. [14]. Web link: http://en. wikipedia. org/wiki/Zipf%27s_law
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.