Chapter 1 Introduction

The objective of this textbook is to provide you the shortest path to exploring your data, visualizing it, forming hypotheses and validating and defending them. Given a data set, you want to be able to make any plot you wish, find plots which show something actionable and interesting, explore data by slicing and dicing it and finally present your results in statistically convincing manner, perhaps in colorful and visually appealing way.

Questions which you will have to anticipate and you will have to answer are - How do you know that your findings are not random? - And fundamental of all questions: - So what?

Even the most impressing looking results may come up randomly. And you will be asked this question along with the question “what was your p-value and how did you compute it”

And even if you convince your audience that your results are not random, you will have to be ready to explain why should you audience care about the results you reported. In other words, is there any actionable value in your results? Or they are just simply interesting, good to know, but no one really needs to care much about them otherwise? Hopefully it is the former not the latter.

In the following sections we will address these questions and go through the process of data exploration, validation, and presentation.

  • We will start with making plots, follow with free style data exploration – which allows us to form the leads, that is hypotheses. Then we will follow with simple statistical tests which will allow us to validate these hypothesis and defend our findings against randomness claims. - We will learn how to calculate p-values and how to use them to defend our findings.
  • We will use as few R commands as possible and reach our goal in shortest possible path. In fact we will demonstrate how using just 7 R commands we can perform quite sophisticated data exploration.

1.1 Setting Up R

  • Important Instructions

    • Installation of R is required before installing RStudio
      • “R” is a programming language, and,
      • “RStudio” is an Integrated Development Environment (IDE) which provides you a platform to code in R.
  • How to download and install R & RStudio?

    • Downloading and installing R.

      • For Windows Users.
        • Click on the link provided below or copy paste it on your favourite browser and go to the website.
        • Click on the link at top left where it says “Download R 4.0.3 for windows” or the latest at the time of your installation.
        • Open the downloaded file and follow the instructions as it is.
      • For MAC Users.
        • Click on the link provided below or copy paste it on your favourite browser and go to the website.
        • Under “Latest release”, click on “R-4.0.3.pkg” or the latest at the time of your installation.
        • Open the downloaded file and follow the instructions as it is.
    • Downloading and installing RStudio.

      • For Windows Users.
        • Click on the link below or copy paste it in your favourite browser.
        • Scroll down almost till the end of the web page until you find a section named “All Installers”.
        • Click on the download link beside “Windows 10/8/7” to download the windows version of RStudio.
        • Install RStudio by clicking on the downloaded file and following the instructions as it is.
      • For MAC Users.
        • Click on the link below or copy paste it in your favourite browser.
        • Scroll down almost till the end of the web page until you find a section named “All Installers”.
        • Click on the link beside “macOS 10.13+” to start your download the MAC version of RStudio.
        • Install RStudio by clicking on the downloaded file and following the instructions as it is.

  • How to upload a data set?

  • To upload the dataset/file present in csv format the read.csv() and read.csv2() functions are frequently used The read.csv() and read.csv2() have different separator symbol: for the former this is a comma, whereas the latter uses a semicolon.

  • Let us look at the example.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFJlYWQgaW4gdGhlIGRhdGFcbmRmIDwtIHJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2RlZXBsb2toYW5kZS9kYXRhMTAxZGVtb2Jvb2svbWFpbi9maWxlcy9kYXRhc2V0L21vb2R5MjAyMGIuY3N2XCIpXG5cbiMgUHJpbnQgb3V0IGBkZmBcbmhlYWQoZGYpIn0=