4. Data source analysis
Data is one of the important factors in data forecasting studies because data represents the whole source of the business purpose of the study.
There are several reasons that the difference of data source makes it hard to compare prediction accuracy from each other.
First, the result of a prediction model may differ with different data sources. Theoretically, the more data we test, the more accurate result we can get, however, in real-world, it is often hard to collect as much data as desired. So a potential difficult question for all data prediction studies is that how much data is enough.
Second, the quality of data source is crucial for prediction studies. Apparently, false data or noisy data is not useful for
…show more content…
With the similar target, Gordiievych and Shubin (2015) did not give any description of their data.
4.1.2. Different range and size of data
For airline prices prediction studies, it is a common practice to use time series data like airline ticket prices. The date ranges of data used by different studies vary from several months to as long as 18 years.
For example, Chen et al. (2015) used 110 days of data in their study, and the other study from Zhang et al. (2010) used 18 years of data to perform the experiment. Some studies did not specify their date range of the data explicitly, such as Wohlfarth et al. (2011), Gordiievych and Shubin (2015) and Cao, Ding, He, and Zhang (2010). Other studies chose other data lengths. Laik, Choy, and Sen (2014) used one year of data, as the same length as Liu, Tan, and Zhou (2016). Yuan, Xu, and Yang (2014), Ghomi and Forghani (2016), and An et al. (2016) used three years, six years, and ten years respectively.
One reason these studies chose different ranges of data for analysis is very likely the fact that data collection for a long time range is difficult. Those studies used data ranges longer than a year were mostly using historical data either from proprietary
1. Though it is the color representation of the data, it can omit important data thus introduce error hence less specific.
Also, could be inadequate due to experimental group numbers do not add up to 32 or control group do not have total 36 and the percentages also do not add up to 100% due to missing data.
The authors reported multiple df values in Table VI. Why were different df values reported for this study?
These results are not useful because these amounts do not give enough evidence if there is fraud involved.
Instead we use the original predictors to predict the response. The original dataset was split into a training set that consists of 75% of the total observations and a test set that consists of 25% of the total observations. Observations were chosen randomly. Supervised learning methods was conducted on the training set to obtain a model, then the model was used on the test set to assess the prediction performance. The values for “K” in KNN were tuned via cross-validation. Due to the volume of the data, the “cost” parameter in the SVM was chosen somewhat ad hoc and the “mtry” parameter in the random forest was chosen as default. The error rates are as
Data is defined as useful raw material which is intended to be useful for both the originator and for the intended receiver. Data consists largely of facts and figures ideal for communicating the intended meaning. This data can be interpreted and can be categorised as follows;
Data is a group of information that are used for various purposes like analysis, evaluations and to arrive at certain results or conclusions. Data reporting is a process where data is extracted form a source or many sources and then converted into a format that can be used for a purpose.
In order to maintain consistency throughout the study, each of the six subjects will utilise the same source for data collection. Due to the fact that sites vary in precision (number of decimal places), activity format and number of trials, this particular measure will assist in ensuring that the evidence used to address the claim is both accurate and reliable.
The challenges arise from the fact that when using multiple sources, it can be easy to miss the connections between data points or to misunderstand the significance of noise when reviewing the massive amount of data that must be crunched, cleansed and turned into useful intelligence.
The quality of the research found on the internet may vary depending on the reputation of the site it is being generated from although the quanity is very very wide it may not always be the best quality.
Error is inevitable. How many times have you made a dish from a recipe something has not been deviated. Maybe you were low on flour and had to substitute cornstarch or you did not have almond abstract and had to use lemon. The result may have looked like the picture but I’m sure the taste varied. In research we take limited samples of the population to make it paint a picture of the whole. A site could recruit 100 heart patients and use the information gathered to try to aid the population of 500,000 heart patients worldwide. Is the information true information of the whole or just a
Data comprises of factual information. Data are the facts from which information is derived. Data is not necessarily informative on its own but needs to be structured, interpreted, analysed and contextualised. Once data undergoes this process, it transforms in to information. Information should be accessible and understood by the reader without needing to be interpreted or manipulated in any way.
They may lack of the experience, or the right practice, even performance of the information, or they can't take advantage of the information they have and used properly.
Major limitation of the project has been the unavailability of current data, of the contributors to
Results tend to be explorative and aren’t necessarily true or reliable as information analysed is mostly based on personal accounts, not fact.