Comparing read_csv with spark_read_csv

Reading in a csv file into R using dplyr’s `read_csv()` function is so simple. The syntax & parameters of dplyr are fairly easy to remember, once you’ve done it a few times.

read_csv(file, 
    col_names = TRUE, 
    col_types = NULL,
    locale = default_locale(),
    na = c(“”, “NA”), 
    quoted_na = TRUE,
    quote = “””, 
    comment = “”, 
    trim_ws = TRUE, 
    skip = 0, n_max = Inf,
    guess_max = min(1000, n_max), 
    progress = show_progress()
)

I’ve only just started working with big data sets, & was began wondering if what I know about the dplyr syntax can be carried over to sparklyr’s spark_read_csv() function.

While not exactly the same, but if you know one, you can quite easily pick the other. There’s an additional parameter `sc`, aka spark connection, that’s required.

spark_read_csv(
    sc, 
    name,
    path, 
    header = TRUE, # FALSE forces a “V_” prefix
    columns = NULL,
    infer_schema = TRUE, # to infer column data type
    delimiter = “,”, 
    quote = “””, 
    escape = “\”,
    charset = “UTF-8”, 
    null_value = NULL,
    options = list(),
    repartition = 0, # number of partitions to distribute the generated table.
    memory = TRUE, 
    overwrite = TRUE, …
)

Everyone does NOT need to learn to code [Article]

Programmer Chase Felker disagrees with the flavour of our times – the need for everyone to learn to code. He thinks that there is a different, and more pertinent need – the need to think.

 I wonder why people are comfortable with thinking of computers as a scary black box in the first place. Computers do only what people tell them to do, and yet it is absurdly common to hear, “Windows crashed again! Call over the IT guy—it’s so complicated!” So many users do not feel empowered to understand how to use computers well, and I think that the urgency to spread programming is a symptom of this feeling. Perhaps if everyone had some practice telling computers what to do, tech intimidation wouldn’t be so prevalent.

Learn to Code

No, this isn’t a resolution based on an arbitrary point in time.

It is the antipodal of a resolution that was made many many moons ago, when I was a student  at school. It was a resolution firmly based in ignorance, that I’d never need to learn about/ to use computers, forget to code. It was 1992, & computers then probably cost the school a fortune. There were fewer than 2 places which had computers – those big machines which needed a home in a frigid room (a difficult venture considering power supply of about 8 hours a day, with arbitrary power cuts by the powers-that-be). I was one of two idiots (as I see the decision now) who refused to have anything to do with those beastly machines, or to get into the cold, hallowed chambers where those machines & their juice-supplying brethren  were grandly housed!

For over 9 years after that, I had very little to do with computers (why would I when they looked something like this ugly beast??), save a boneheaded decision to type del* in DOS at the behest of some learned companion on a prized possession of a pompous-ass rich-dad brat  [another story best forgotten, several lessons very well learnt].

Story made short, my re-vocation in accounting & finance made it absolutely imperative to learn to locate the power-on switch on the wretched x386 (no mean feat for me, made horribly worse that I would usually be the one to open up the office), get familiar with MS Excel (the glue that holds most firms together, despite claims by more sophisticated applications to be otherwise), patch up a network, connect & install printers, build & rebuild computers (Good heavens, how do I open this box???), install accounting applications, then other applications, train people less familiar than I was to use computers (found out that such a breed does exist! I was one too!!!) etc etc etc.

I looked on enviously as colleagues, friends etc did magic with their programming skills, finishing work quicker & having enough time to have a beer/ coffee/ smoke, while my longhand way of working meant I had no time for any of those vices – or a life outside of those long hours at work.

I’ve been teaching myself some VBA programming, getting help from some of the best in the Excel VBA business ( MVP’s all.. Dick Kusleika, ChandooJohn Walkenbach, Debra Dalgleish, Bill Jelen, Mike Alexander, Jon Peltier via their books/ websites/ blogs/ podcasts etc) for a couple of years now, hacking/ copy-pasting together enough code to allow me to hasten the delivery of the reports-that-no-one-needs-or-reads-but-must-be-on-my-desk-in-20-minutes & get out of the cell (pun intended) of work in time to see the sunset or walk on the beach or to play with my kids.

When Fred Wilson posted this on his site as his New Year’s resolution, I jumped at the opportunity to learn another languages, just to keep the grey cells grey; registered for this course, & am on my way to confound the daylights out of me. But I’m making some progress, or so I think.

Program or be programmed.
I choose the former.
Good luck to me.