R High Performance Programming (Anglais) Broché – 29 janvier 2015
|Neuf à partir de||Occasion à partir de|
- Choisissez parmi 17 000 points de collecte en France
- Les membres du programme Amazon Prime bénéficient de livraison gratuites illimitées
- Trouvez votre point de collecte et ajoutez-le à votre carnet d’adresses
- Sélectionnez cette adresse lors de votre commande
Description du produit
Présentation de l'éditeur
Overcome performance difficulties in R with a range of exciting techniques and solutions
About This Book
- Benchmark and profile R programs to solve performance bottlenecks
- Combine the ease of use and flexibility of R with the power of big data tools
- Filled with practical techniques and useful code examples to process large data sets more efficiently
Who This Book Is For
This book is for programmers and developers who want to improve the performance of their R programs by making them run faster with large data sets or who are trying to solve a pesky performance problem.
What You Will Learn
- Benchmark and profile R programs to solve performance bottlenecks
- Understand how CPU, memory, and disk input/output constraints can limit the performance of R programs
- Optimize R code to run faster and use less memory
- Use compiled code in R and other languages such as C to speed up computations
- Harness the power of GPUs for computational speed
- Process data sets that are larger than memory using disk-based memory and chunking
- Tap into the capacity of multiple CPUs using parallel computing
- Leverage the power of advanced database systems and Big Data tools from within R
With the increasing use of information in all areas of business and science, R provides an easy and powerful way to analyze and process the vast amounts of data involved. It is one of the most popular tools today for faster data exploration, statistical analysis, and statistical modeling and can generate useful insights and discoveries from large amounts of data.
Through this practical and varied guide, you will become equipped to solve a range of performance problems in R programming. You will learn how to profile and benchmark R programs, identify bottlenecks, assess and identify performance limitations from the CPU, identify memory or disk input/output constraints, and optimize the computational speed of your R programs using great tricks, such as vectorizing computations. You will then move on to more advanced techniques, such as compiling code and tapping into the computing power of GPUs, optimizing memory consumption, and handling larger-than-memory data sets using disk-based memory and chunking.
Biographie de l'auteur
Aloysius Lim has a knack for translating complex data and models into easy-to-understand insights. As cofounder of About People, a data science and design consultancy, he loves solving problems and helping others to find practical solutions to business challenges using data. His breadth of experience―7 years in the government, education, and retail industries―equips him with unique perspectives to find creative solutions.
William Tjhi is a data scientist with years of experience working in academia, government, and industry. He began his data science journey as a PhD candidate researching new algorithms to improve the robustness of high-dimensional data clustering. Upon receiving his doctorate, he moved from basic to applied research, solving problems among others in molecular biology and epidemiology using machine learning. He published some of his research in peer-reviewed journals and conferences. With the rise of Big Data, William left academia for industry, where he started practicing data science in both business and public sector settings. William is passionate about R and has been using it as his primary analysis tool since his research days. He was once part of Revolution Analytics, and there he contributed to make R more suitable for Big Data.
Aucun appareil Kindle n'est requis. Téléchargez l'une des applis Kindle gratuites et commencez à lire les livres Kindle sur votre smartphone, tablette ou ordinateur.
Pour obtenir l'appli gratuite, saisissez votre numéro de téléphone mobile.
Détails sur le produit
Si vous vendez ce produit, souhaitez-vous suggérer des mises à jour par l'intermédiaire du support vendeur ?
Commentaires client les plus utiles sur Amazon.com
Has a slight Linux bias which kind of sucks since R (especially R studio) on Linux can be a pretty meh experience.
Although the book has some nice examples on how to improve performance in R, and it also did a pretty good job of explaining why R's implementation can lead to bottlenecks, I had the impression that it was in the middle of an identity crisis. There was a lot of discussion of general tips on how to optimize code - preallocating data structures, using types with small memory footprints, identifying speed bottlenecks to optimize effectively, principles of parallelization - the sections that covered these issues were fairly shallow. I would expect that most readers are already familiar with the general principles outlined in the book, so they weren't the most helpful tips I've found. This becomes increasingly clear when you look at the examples the author shows you - if you're able to use the packages he introduces for using R with a GPU or Hadoop, you're already familiar with the tricks he describes.
In my opinion, the book would have been a lot more useful, had the author decided either to write about optimizing code in dynamic languages in general and then going into more detail, or if he would have made a cookbook that goes into more depth on the packages he introduced. Your mileage may vary, but although the book was well written, I would have profited more from a cookbook or even just a short summary of available packages with kurt descriptions.
The first chapter does a good job explaining the internals of R and analyzing its performance characteristics.
It does not dissect the complete language (for that I would recommend you Advanced R from Hadley) but the important parts which impact performance.
The second chapter shows you several tools and techniques to measure and profile your code.
The simple tweaks chapters are quite illustrative and have very interesting examples.
For example just switching the BLAS version included with R with the one that comes with Mac OS X shows great performance improvement.
It explains and gives you advice on how to choose the right data structures. Don't make everything a data.frame!
One of the most interesting chapters for me was Processing Large Datasets with Limited RAM.
With the size of current available datasets, the amount of RAM available becomes an issue.
The author discusses several techniques and libraries to deal with this issue.
Another well explained section is Parallel Computing. It has very nice examples as well.
This topic is really important to fully utilize the CPUs we have paid for in our PCs.
I also liked the GPU chapter, I was not aware of the GPU support in R.
It is a very nice surprise and if you don't have a computer with GPUs you can try one of the Amazon AMIs the author suggests.
GPUs are the state-of-the-art in Deep Learning and it is great to see that R has libraries that support it.
The last chapter introduces the use of R and Hadoop to deal with large scale processing.
It gives you a nice step-by-step setup so you can try in Amazon.
In summary I recommend this book, although the content is broad, it gives you enough for you to start going deeper in any of these topics.
And I need to state frankly, since R has left the academic circles a long time ago and now is being used more and more in applications involving the Big Data calibre of projects a developer or an R user needs to understand its limitations and perhaps even be able to shrug off some misconceptions that surface on and off about the R’s Big Data suitability.
This book will make you prepared to cope with those who encroach on R’s capability to process petabytes of data. Bedsides, since the authors have a very broad outlook on the technologies and succeeded to cover very difficult topics in simple terms this book actually is of an asset to any software developer, using any language on any platform
What do you need for this book: preferably a *NIX based 64 bit machine capable enough to run a Virtual Machine with an NVIDIA GPU. An Amazon EWS account. Eclipse R Add on (R Studio was cited as storing object state). A Windows user will be able to learn as much, but some of the libraries covered in the book (just a few) were not ported to Windows at the time of my reading.
Aloysius and William cover the code execution benchmarking techniques at the beginning very well and then make you embark on wonderful journey to exploring an array of CRAN packages, third party tools and frameworks, the book includes the use of Hadoop, PostgreSQL, MonetDB (vertical data store), Pivotal SciDB, and more so you will not be limited to a narrow subset of tools to use under your belt, it will be something like dirking from the firehose!
I read this book in one breath, it is was just that a fascinating journey. I now think I need to come back, and read several chapters of immense interest to me: code pre-compilation (just so easy to take advantage of), the FF, dplyr and BigMemory package (just take advantage of somebody giving you a hand). I will experiment with at least one database, perhaps MonetDB as being at fingertips reach.
If I had a small complaint that would be for the absence of the statistical visualizations code – I just would like to benchmark my own improvements.
All in all, it is a fantastic book, thank you Aloysius and William! A very timely release Packt!
My verdict, is it a superb reading!
Rechercher des articles similaires par rubrique