Ce livre est très riche d'exemples concrets, qui permettent de se familiariser avec la librairie Pandas très rapidement.
En outre comme cette librairie est une vraie libération lorsque l'on traite des séries / tableaux de données (en particulier lorsque l'on n'est pas trop fan d'Excel et de Visual Basic), ce livre peut devenir une vraie bible pour le programmateur Python. Si vous devez faire de l'analyse de données et hésitez entre Python et R, ce livre pourrait bien faire pencher la balance en faveur du premier...
Seul bémol, je ne sais pas s'il est très adapté à un débutant Python (en tout cas certainement pas à un débutant en programmation : les mains sont trempées dès les premières pages dans le cambouis)...
Le seul bouquin que j'ai trouvé sur Python qui traite de la librairie pandas (probablement l'avenir en traitement de données en général, et statistique en particulier si cette librairie prometteuse arrive à maturation).
Et je compare avec 8 autres titres (dont peu en français d'un niveau valable), sans compter 4 titres supplémentaires sur les traitements geospatial, texte, PyQt, matplotlib et visualisation scientifique en général.
Commentaires client les plus utiles sur Amazon.com (beta)
57 internautes sur 62 ont trouvé ce commentaire utile
A book about tools that fills a need in scientific computing29 octobre 2012
- Publié sur Amazon.com
Python For Data Analysis is a book about tools. Python is an excellent general purpose language that has developed some niche applications, science being one of them due to some excellent libraries such as NumPy, SciPy, IPython, Matplotlib, and increasingly Pandas -- which Wes created. Collectively these tools form the basis of the "scientific computing stack" and are utilized by anyone who gets their hands dirty with data.
To steal from the book, Wes states, "This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is NOT (author's emphasis) an exposition on analytical methods using Python as the implementation language."
This is a book for any level of professional, researcher, or academic working with data. You could be a beginner who wants to get started, a professional coming from discipline rooted in another language like Matlab, or even someone seasoned in data-manipulation with Python who wants to get more work done in less time with greater ease.
While Pandas is the main focus of the book, sections dedicated to IPython (a shell for interactive execution) and NumPy (Matlab-like vectorized arrays) means there is something for everyone. For example, you might already use IPython, but not to its fullest potential. Wes shows how to be more efficient using the interactive debugger.
Amazon limits their ratings to 5-stars, but if I gave a star for every time I learned something new that made my analysis easier this book would be off the charts!
116 internautes sur 138 ont trouvé ce commentaire utile
dive into pandas and NumPy23 octobre 2012
R. Friesel Jr.
- Publié sur Amazon.com
Format: Format Kindle
Wes McKinney's "Python for Data Analysis" (O'Reilly, 2012) is a tour pandas and NumPy (mostly pandas) for folks looking to crunch "big-ish" data with Python. The target audience is not Pythonistas, but rather scientists, educators, statisticians, financial analysts, and the rest of the "non-programmer" cohort that is finding more and more these days that it needs to do a little bit-sifting to get the rest of their jobs done.
First, two warnings:
1. **This book is not an introduction to Python.** While McKinney does not assume that you know *any* Python, he isn't exactly going to hold your hand on the language here. There is an appendix ("Python Language Essentials") that beginners will want to read before getting too far, but otherwise you're on your own. ("Lucky for you Python is executable pseudocode"?)
2. **This book is not about theories of data analysis.** What I mean by that is: if you're looking for a book that is going to tell you the *types* of analyses to do, this is not that book. McKinney assumes that you already know, through your "actual" training, what kinds of analyses you need to perform on your data, and how to go about the computations necessary for those analyses.
That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), offering overviews of the utilities in these packages, and concrete examples on how to employ them to great effect. In examining these libraries, McKinney also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables). McKinney also delves into some (semi) esoteric information about how Python works at very low levels and ways to optimize data structures so that you can get maximum performance from your programs. McKinney is clearly knowledgeable about these libraries, about Python, and about using those tools effectively in analytical software.
So where do I land on "Python for Data Analysis"? If you're looking for a book that discusses data analysis in a broad sense, or one that pays special attention to the theory, this isn't that book. If you're looking for a generalist's book on Python--also not this book. However, if you've already selected Python as your analytical tool (and it sounds like it's more/less the de facto analytical tool in many circles) then this just might be the perfect book for you.
DISCLOSURE: I received an electronic copy of this book from the publisher in exchange for writing a review.
50 internautes sur 60 ont trouvé ce commentaire utile
A tutorial in need of editorial work; not comprehensive; not a useful reference30 mars 2013
Richard C. Yeh
- Publié sur Amazon.com
I think this book is genuinely trying to be helpful, by giving an extended tutorial on the pandas library; but the tutorial covers only selected topics, and needs to be supplemented with a comprehensive function reference. The narrative also needs to be cut with the help of a strict editor.
If you are trying to decide whether to learn to use the pandas library, this book is for you. It starts with an example of how python and the pandas library can make it easy to do some basic analyses of data, and then develops more specialized chapters: summary statistics, data storage, data transformation (merging and joining), plotting, aggregation, time-series, special considerations for financial or economic data, advanced special topics.
Once I decided to use the pandas library, the book suddenly became less useful. The author has a verbose pedagogical style, and the book never departs from its tutorial perspective. Functions are introduced with examples but no definitions, and it's hard to find the rare summaries of functions, function arguments, or discussion suggesting when to use one method instead of another.
If you want to do something very close to what's done in an example, it's easy to follow along. Once you want to do something not emphasized or covered by an example, there is no guidance, no reference or dictionary section to give any hint about where I might search next --- google will probably direct you to stackoverflow.com, or the official pandas documentation site.
For example, suppose you have loaded your data into a DataFrame, and you want to use another column as the index. The book has several pages on the useful reindex() method, but that method is for resampling the data. Instead, you want set_index() --- but the book only mentions set_index() in passing, without saying what it does, far from the section where the DataFrame index is covered.
There have been some attempts to remedy this, with "quick reference cards" for pandas --- but they are in general also not comprehensive.
Finally, there is little guidance on the kinds of problems where you would be better served using numpy or some other tool instead of pandas. (There are a few paragraphs on areas where you might not want to use python.)
[Update: by mid 2013, the API reference at the official pandas documentation has the comprehensive listings that I was looking for --- see http pandas.pydata.org pandas-docs stable api.html . By version 0.12.0, all of the various function arguments seem to have been described with examples of acceptable settings. Also, the data analytical work (as opposed to cleaning and organization) has moved to the related statsmodels project, which requires pandas. So, to use that, it's important to be familiar with pandas.]
To the editor:
On many pages, there is some comment, phrasing, or trivial fact that I would have eliminated. Example:
"In some cases, a table might not have a fixed delimiter, using whitespace or some other pattern to separate fields. In these cases, ..."
"In part for legacy reasons (much earlier versions of pandas), DataFrame's join method ..."
"In my experience, having to align data by hand (and worse, having to verify that data is aligned) is a far too rigid and tedious way to work. It is also rife with potential for bugs due to combining misaligned data."
This is a technical publication, not a narrative!
Many of the code examples break across physical and PDF pages, which create small interruptions when reading. This may be hard to avoid when about half the text space is occupied by worked examples.
last line on page 129: a b c d a b c d e
first line on page 130: 0 0 1 2 3 0 0 1 2 3 4
39 internautes sur 46 ont trouvé ce commentaire utile
Python and Pandas21 octobre 2012
- Publié sur Amazon.com
Format: Format Kindle
I'm a C++ programmer who discovered Python in August and have started a Stats course where everyone else is using Stata or R. So I'm reading this book fast. I've tried to read other software books on my kindle before so I was a bit nervous. But the formatting here is excellent. There are times when Wes uses multiple columns and you have to figure out the flow with python line numbers but its generally good and readable. The indexing is also excellent - finally someone who was thinking about e-readers.
So I am not a Python, Numpy or Pandas expert.
I took the $5 upgrade at O'Reilly so I have downloaded a pdf for backup viewing and also get future enhancements to the book.
The material appears good and the coverage thorough. I've been working through the Language Essentials as well and its clarified a couple of things I misunderstood after earlier Python books so at this point I'll give it 5 stars. I'll re-review later if I come to a different conclusion.
7 internautes sur 7 ont trouvé ce commentaire utile
Finally, I can ditch R and use a language I love7 août 2013
- Publié sur Amazon.com
I've been using Python as my primary language for 10 years on and off, but have been shackled to R for any statistics or graphing for lack of knowledge of the scientific Python environment and no clear place to learn it all. I knew Python would be perfect for data analysis, but never knew where to begin. Because of this book, I can finally say that I am completely R free and loving it!
The book is incredibly well written by the guy that developed the pandas library. He brings his practical data analysis experience into this text and it shines through. Each chapter takes you through the core libraries and tools that you'll need to conduct real data analysis from beginning to end. He is especially sensitive to the realities of handling real world data, which is often messy and needs to be massaged into a usable form, and which Python and its libraries are ridiculously good at handling. The introduction to iPython is perfect for anyone coming from MATLAB/R/etc. that has been missing a lot of the interactive features that those languages offer by default.
You should have at minimum an introductory understanding of Python and statistics, which you likely have if you're the kind of person that would think to pick this book up in the first place. Other than that, this book will teach you how to conduct data analysis in the best possible way with the best possible language.