Na dnešní lekci si do virtuálního prostředí nainstalujte následující balíčky. Můžete použít prostředí z lekce o NumPy.
$ python -m pip install --upgrade pip
$ python -m pip install notebook pandas matplotlib
Pro případ, že by vaše verze pip
-u neuměla wheels nebo na PyPI nebyly příslušné wheel balíčky, je dobré mít na systému nainstalovaný překladač C a Fortranu (např. gcc
, gcc-gfortran
) a hlavičkové soubory Pythonu (např. python3-devel
). Jestli je ale nemáte, zkuste instalaci přímo – wheels pro většinu operačních systémů existují – a až kdyby to nefungovalo, instalujte překladače a hlavičky.
Mezitím co se instaluje, stáhněte si do adresáře static
potřebné soubory:
actors.csv a
spouses.csv.
A až bude nainstalováno, spusťte si nový Notebook. (Viz lekce o Notebooku.)
Jedna z oblastí, kde popularita Pythonu neustále roste, je analýza dat. Co tenhle termín znamená?
Máme nějaká data; je jich moc a jsou nepřehledná. Datový analytik je zpracuje, přeskládá, najde v nich smysl, vytvoří shrnutí toho nejdůležitějšího nebo barevnou infografiku.
Ze statistických údajů o obyvatelstvu zjistíme, jak souvisí příjmy s dostupností škol. Zpracováním měření z fyzikálního experimentu ověříme, jestli platí hypotéza. Z log přístupů na webovou službu určíme, co uživatelé čtou a kde stránky opouštějí.
Na podobné úkoly je možné použít jazyky vyvinuté přímo pro analýzu dat, jako R, které takovým úkolům svojí syntaxí a filozofií odpovídají víc. Python jako obecný programovací jazyk sice místy vyžaduje krkolomnější zápis, ale zato nabízí možnost data spojit s jinými oblastmi – od získávání informací z webových stránek po tvoření webových či desktopových rozhraní.
Práce datového analytika se většinou drží následujícího postupu:
*(založeno na diagramu z knihy *Data Wrangling in Python* od Jacqueline Kazil & Katharine Jarmul, str. 3)*
S prvními dvěma kroky Python příliš nepomůže; k těm jen poznamenám, že „Co zajímavého se z těch dat dá vyčíst?” je validní otázka. Na druhé dva kroky se dá s úspěchem použít pythonní standardní knihovna: json
, csv
, případně doinstalovat requests
, lxml
pro XML či xlwt
/openpyxl
na excelové soubory.
Na zkoumání dat a přípravu výsledků pak použijeme specializovanou „datovou” knihovnu – Pandas.
Pandas slouží pro analýzu dat, které lze reprezentovat 2D tabulkou. Tento „tvar” dat najdeme v SQL databázích, souborech CSV nebo tabulkových procesorech. Stručně řečeno, co jde dělat v Excelu, jde dělat i v Pandas. (Pandas má samozřejmě funkce navíc, a hlavně umožňuje analýzu automatizovat.)
Jak bylo řečeno u NumPy, analytici – cílová skupina této knihovny – mají rádi zkratky. Ve spoustě materiálů na Webu proto najdete import pandas as pd
, případně rovnou (a bez vysvětlení) použité pd
jako zkratku pro pandas
. Tento návod ale používá plné jméno.
import pandas
Základní datový typ, který Pandas nabízí, je DataFrame
, neboli lidově „tabulka”. Jednotlivé záznamy jsou v ní uvedeny jako řádky a části těchto záznamů jsou úhledně srovnány ve sloupcích.
Nejpoužívanější způsob, jak naplnit první DataFrame, je načtení ze souboru. Na to má Pandas sadu funkcí začínající read_
. (Některé z nich potřebují další knihovny, viz dokumentace.)
Jeden z nejpříjemnějších formátů je CSV:
actors = pandas.read_csv('static/actors.csv', index_col=None)
actors
Případně lze tabulku vytvořit ze seznamu seznamů:
items = pandas.DataFrame([
["Book", 123],
["Computer", 2185],
])
items
…nebo seznamu slovníků:
items = pandas.DataFrame([
{"name": "Book", "price": 123},
{"name": "Computer", "price": 2185},
])
items
V Jupyter Notebooku se tabulka vykreslí „graficky”. V konzoli se vypíše textově, ale data v ní jsou stejná:
print(actors)
Základní informace o tabulce se dají získat metodou info
:
actors.info()
Vidíme, že je to tabulka (DataFrame
), má 6 řádků indexovaných
(pomocí automaticky vygenerovaného indexu) od 0 do 5
a 3 sloupce: jeden s objekty, jeden s int64
a jeden s bool
.
Tyto datové typy (dtypes
) se doplnily automaticky podle zadaných
hodnot. Pandas je používá hlavně pro šetření pamětí: pythonní objekt
typu bool
zabírá v paměti desítky bytů, ale v bool
sloupci
si každá hodnota vystačí s jedním bytem.
Na rozdíl od NumPy jsou typy dynamické: když do sloupce zapíšeme „nekompatibilní”
hodnotu, kterou Pandas neumí převést na daný typ, typ sloupce
se automaticky zobecní.
Některé automatické převody ovšem nemusí být úplně intuitivní, např. None
na NaN
.
Sloupec, neboli Series
, je druhý základní datový typ v Pandas. Obsahuje sérii hodnot, jako seznam, ale navíc má jméno, datový typ a „index”, který jednotlivé hodnoty pojmenovává. Sloupce se dají získat vybráním z tabulky:
birth_years = actors['birth']
birth_years
type(birth_years)
birth_years.name
birth_years.index
birth_years.dtype
S informacemi ve sloupcích se dá počítat. Základní aritmetické operace (jako sčítání či dělení) se sloupcem a skalární hodnotou (číslem, řetězcem, ...) provedou danou operaci nad každou hodnotou ve sloupci. Výsledek je nový sloupec:
ages = 2016 - birth_years
ages
century = birth_years // 100 + 1
century
To platí jak pro aritmetické operace (+
, -
, *
, /
, //
, %
, **
), tak pro porovnávání:
birth_years > 1940
birth_years == 1940
Když sloupec nesečteme se skalární hodnotou (číslem) ale sekvencí, např. seznamem nebo dalším sloupcem, operace se provede na odpovídajících prvcích. Sloupec a druhá sekvence musí mít stejnou délku.
actors['name'] + [' (1)', ' (2)', ' (3)', ' (4)', ' (5)', ' (6)']
Řetězcové operace se u řetězcových sloupců schovávají pod jmenným prostorem str
:
actors['name'].str.upper()
... a operace s daty a časy (datetime) najdeme pod dt
.
Ze slupců jdou vybírat prvky či podsekvence podobně jako třeba ze seznamů:
birth_years[2]
birth_years[2:-2]
A navíc je lze vybírat pomocí sloupce typu bool
, což vybere ty záznamy, u kterých je odpovídající hodnota true. Tak lze rychle vybrat hodnoty, které odpovídají nějaké podmínce:
# Roky narození po roce 1940
birth_years[birth_years > 1940]
Protože Python neumožňuje předefinovat chování operátorů and
a or
, logické spojení operací se tradičně dělá přes bitové operátory &
(a) a |
(nebo). Ty mají ale neintuitivní prioritu, proto se jednotlivé výrazy hodí uzavřít do závorek:
# Roky narození v daném rozmezí
birth_years[(birth_years > 1940) & (birth_years < 1943)]
Sloupce mají zabudovanou celou řadu operací, od základních (např. column.sum()
, která bývá rychlejší než vestavěná funkce sum()
) po roztodivné statistické specialitky. Kompletní seznam hledejte v dokumentaci. Povědomí o operacích, které sloupce umožňují, je základní znalost datového analytika.
print('Součet: ', birth_years.sum())
print('Průměr: ', birth_years.mean())
print('Medián: ', birth_years.median())
print('Počet unikátních hodnot: ', birth_years.nunique())
print('Koeficient špičatosti: ', birth_years.kurtosis())
Zvláště mocná je metoda apply
, která nám dovoluje aplikovat jakoukoli funkci na všechny hodnoty sloupce:
actors['name'].apply(lambda x: ''.join(reversed(x)))
actors['alive'].apply({True: 'alive', False: 'deceased'}.get)
Prvky ze sloupců jdou vybírat jako u seznamů. Ale z tabulek v Pandas jde vybírat spoustou různých způsobů. Tradiční hranaté závorky plní několik funkcí najednou, takže někdy není na první pohled jasné, co jaké indexování znamená:
actors['name'] # Jméno sloupce
actors[1:-1] # Interval řádků
actors[['name', 'alive']] # Seznam sloupců
Toto je příklad nejednoznačného chování, které zjednodušuje život datovým analytikům, pro které je knihovna Pandas primárně určena.
My, coby programátoři píšící robustní kód, budeme čisté indexování ([]
) používat jen pro výběr sloupců podle jména.
Pro ostatní přístup použijeme tzv. indexery, jako loc
a iloc
.
loc
Indexer loc
zprostředkovává primárně řádky, a to podle indexu, tedy hlaviček tabulky. V našem příkladu jsou řádky očíslované a sloupce pojmenované, ale dále uvidíme, že v obou indexech můžou být jakékoli hodnoty.
actors
actors.loc[2]
Všimněte si, že loc
není metoda: používají se s ním hranaté závorky.
Použijeme-li k indexování n-tici, prvním prvkem se indexují řádky a druhým sloupce – podobně jako u NumPy:
actors.loc[2, 'birth']
Na obou pozicích může být „interval”, ale na rozdíl od klasického Pythonu jsou ve výsledku obsaženy obě koncové hodnoty. (S indexem, který nemusí být vždy číselný, to dává smysl.)
actors.loc[2:4, 'birth':'alive']
Když uvedeme jen jednu hodnotu, sníží se dimenzionalita – z tabulky na sloupec (případně řádek – taky Series), ze sloupce na skalární hodnotu. Porovnejte:
actors.loc[2:4, 'name']
actors.loc[2:4, 'name':'name']
Chcete-li vybrat sloupec, na místě řádků uveďte dvojtečku – t.j. kompletní interval.
actors.loc[:, 'alive']
Další možnost indexování je seznamem hodnot. Tím se dají řádky či sloupce vybírat, přeskupovat, nebo i duplikovat:
actors.loc[:, ['name', 'alive']]
actors.loc[[3, 2, 4, 4], :]
iloc
Druhý indexer, který si v krátkosti ukážeme, je iloc
. Umí to samé co loc
, jen nepracuje s klíčem, ale s pozicemi řádků či sloupců. Funguje tedy jako indexování v NumPy.
actors
actors.iloc[0, 0]
Protože iloc
pracuje s čísly, záporná čísla a intervaly fungují jako ve standardním Pythonu:
actors.iloc[-1, 1]
actors.iloc[:, 0:1]
Indexování seznamem ale funguje jako u loc
:
actors.iloc[[0, -1, 3], [-1, 1, 0]]
Jak loc
tak iloc
fungují i na sloupcích (Series), takže se dají kombinovat:
actors.iloc[-1].loc['name']
V minulé sekci jsme naťukli indexy – jména jednotlivých sloupců nebo řádků. Teď se podívejme, co všechno s nimi lze dělat. Načtěte si znovu stejnou tabulku:
actors = pandas.read_csv('static/actors.csv', index_col=None)
actors
Tato tabulka má dva klíče: jeden pro řádky, index
, a druhý pro sloupce, který se jmenuje columns
.
actors.index
actors.columns
Klíč se dá změnit tím, že do něj přiřadíme sloupec (nebo jinou sekvenci):
actors.index = actors['name']
actors
actors.index
Potom jde pomocí tohoto klíče vyhledávat. Chceme-li vyhledávat efektivně (což dává smysl, pokud by řádků byly miliony), je dobré nejdřív tabulku podle indexu seřadit:
actors = actors.sort_index()
actors
actors.loc[['Eric', 'Graham']]
Pozor ale na situaci, kdy hodnoty v klíči nejsou unikátní. To Pandas podporuje, ale chování nemusí být podle vašich představ:
actors.loc['Terry']
Trochu pokročilejší možnost, jak klíč nastavit, je metoda set_index
. Nejčastěji se používá k přesunutí sloupců do klíče, ale v dokumentaci se dočtete i o dalších možnostech.
Přesuňte teď do klíče dva sloupce najednou:
indexed_actors = actors.set_index(['name', 'birth'])
indexed_actors
Vznikl tím víceúrovňový klíč:
indexed_actors.index
Řádky z tabulky s víceúrovňovým klíčem se dají vybírat buď postupně po jednotlivých úrovních, nebo n-ticí:
indexed_actors.loc['Terry']
indexed_actors.loc['Terry'].loc[1940]
indexed_actors.loc[('Terry', 1942)]
Kromě výběru dat mají klíče i jinou vlastnost: přidáme-li do tabulky nový sloupec s klíčem, jednotlivé řádky se seřadí podle něj:
indexed_actors
last_names = pandas.Series(['Gilliam', 'Jones', 'Cleveland'],
index=[('Terry', 1940), ('Terry', 1942), ('Carol', 1942)])
last_names
indexed_actors['last_name'] = last_names
indexed_actors
V posledním příkladu vidíme, že Pandas doplňuje za neznámé hodnoty NaN
, tedy "Not a Number" – hodnotu, která plní podobnou funkci jako NULL
v SQL nebo None
v Pythonu. Znamená, že daná informace chybí, není k dispozici nebo ani nedává smysl ji mít. Naprostá většina operací s NaN
dává opět NaN
:
'(' + indexed_actors['last_name'] + ')'
NaN se chová divně i při porovnávání; (NaN == NaN)
je nepravda. Pro zjištění chybějících hodnot máme metodu isnull()
:
indexed_actors['last_name'].isnull()
Abychom se NaN
zbavili, máme dvě možnosti. Buď je zaplníme pomocí metody fillna
hodnotou jako 0
, False
nebo, pro přehlednější výpis, prázdným řetězcem:
indexed_actors.fillna('')
Nebo se můžeme zbavit všech řádků, které nějaký NaN
obsahují:
indexed_actors.dropna()
Bohužel existuje jistá nekonzistence mezi NaN
a slovy null
či na
v názvech funkcí. C'est la vie.
Někdy se stane, že máme více souvisejících tabulek, které je potřeba spojit dohromady. Na to mají DataFrame
metodu merge()
, která umí podobné operace jako JOIN
v SQL.
actors = pandas.read_csv('static/actors.csv', index_col=None)
actors
spouses = pandas.read_csv('static/spouses.csv', index_col=None)
spouses
actors.merge(spouses)
Mají-li spojované tabulky sloupce stejných jmen, Pandas je spojí podle těchto sloupců. V dokumentaci se dá zjistit, jak explicitně určit podle kterých klíčů spojovat, co udělat když v jedné z tabulek chybí odpovídající hodnoty apod.
Fanoušky SQL ještě odkážu na porovnání mezi SQL a Pandas.
Dostáváme se do bodu, kdy nám jednoduchá tabulka přestává stačit. Pojďme si vytvořit tabulku větší: fiktivních prodejů v e-shopu, ve formátu jaký bychom mohli dostat z SQL databáze nebo datového souboru.
Použijeme k tomu mimo jiné date_range
, která vytváří kalendářní intervaly. Zde, i v jiných případech, kdy je jasné, že se má nějaká hodnota interpretovat jako datum, nám Pandas dovolí místo objektů datetime
zadávat data řetězcem:
import itertools
import random
random.seed(0)
months = pandas.date_range('2015-01', '2016-12', freq='M')
categories = ['Electronics', 'Power Tools', 'Clothing']
data = pandas.DataFrame([{'month': a, 'category': b, 'sales': random.randint(-1000, 10000)}
for a, b in itertools.product(months, categories)
if random.randrange(20) > 0])
Tabulka je celkem dlouhá (i když v analýze dat bývají ještě delší). Podívejme se na několik obecných informací:
# Prvních pár řádků (dá se použít i např. head(10), bylo by jich víc)
data.head()
# Celkový počet řádků
len(data)
data['sales'].describe()
Pomocí set_index
nastavíme, které sloupce budeme brát jako hlavičky:
indexed = data.set_index(['category', 'month'])
indexed.head()
Budeme-li chtít z těchto dat vytvořit tabulku, která má v řádcích kategorie a ve sloupcích měsíce, můžeme využít metodu unstack
, která "přesune" vnitřní úroveň indexu řádků do sloupců a uspořádá podle toho i data.
Můžeme samozřejmě použít kteroukoli úroveň klíče; viz dokumentace k unstack
a reverzní operaci stack
.
unstacked = indexed.unstack('month')
unstacked
Teď je sloupcový klíč dvouúrovňový, ale úroveň sales
je zbytečná. Můžeme se jí zbavit pomocí MultiIndex.droplevel
.
unstacked.columns = unstacked.columns.droplevel()
unstacked
A teď můžeme data analyzovat. Kolik se celkem utratilo za elektroniku?
unstacked.loc['Electronics'].sum()
Jak to vypadalo se všemi elektrickými zařízeními v třech konkrétních měsících?
unstacked.loc[['Electronics', 'Power Tools'], '2016-03':'2016-05']
A jak se prodávalo oblečení?
unstacked.loc['Clothing']
Metody stack
a unstack
jsou sice asi nejužitečnější, ale stále jen jeden ze způsobů jak v Pandas tabulky přeskládávat. Náročnější studenti najdou další možnosti v dokumentaci.
Je-li nainstalována knihovna matplotlib
, Pandas ji umí využít k tomu, aby kreslil grafy. Nastavení je trochu jiné pro Jupyter Notebook a pro příkazovou řádku.
Používáte-li Jupyter Notebook, zapněte integraci pro kreslení grafů pomocí:
import matplotlib
# Zapnout zobrazování grafů (procento uvozuje „magickou” zkratku IPythonu):
%matplotlib inline
a pak můžete přímo použít metodu plot()
, která bez dalších argumentů vynese data z tabulky proti indexu:
unstacked.loc['Clothing'].dropna().plot()
Jste-li v příkazové řádce, napřed použij plot()
a potom se na graf buď podívete, nebo ho uložte:
# Setup
import matplotlib.pyplot
# Plot
unstacked.loc['Clothing'].plot()
matplotlib.pyplot.show()
matplotlib.pyplot.savefig('graph.png')
Funkce show
a savefig
pracují s „aktuálním” grafem – typicky posledním, který se vykreslil. Pozor na to, že funkce savefig
aktuální graf zahodí; před dalším show
nebo savefig
je potřeba ho vykreslit znovu.
V kombinaci s dalšími funkcemi Series
a DataFrame
umožňují grafy získat o datech rychlý přehled:
# Jak se postupně vyvíjely zisky z oblečení?
# `.T` udělá transpozici tabulky (vymění řádky a sloupce)
# `cumsum()` spočítá průběžný součet po sloupcích
unstacked.T.fillna(0).cumsum().plot()
# Jak si proti sobě stály jednotlivé kategorie v březnu, dubnu a květnu 2016?
unstacked.loc[:, '2016-03':'2016-05'].plot.bar(legend=False)
Další informace jsou, jak už to bývá, v dokumentaci.
Často používaná operace pro zjednodušení tabulky je groupby
, která sloučí dohromady řádky se stejnou hodnotou v některém sloupci a sloučená data nějak agreguje.
data.head()
Samotný výsledek groupby()
je jen objekt:
data.groupby('category')
... na který musíme zavolat příslušnou agregující funkci. Tady je například součet částek podle kategorie:
data.groupby('category').sum()
Nebo počet záznamů:
data.groupby('category').count()
Groupby umí agregovat podle více sloupců najednou (i když u našeho příkladu nedává velký smysl):
data.groupby(['category', 'month']).sum().head()
Chceme-li aplikovat více funkcí najednou, předáme jejich seznam metodě agg
. Časté funkce lze předat jen jménem, jinak předáme funkci či metodu přímo:
data.groupby('category').agg(['mean', 'median', sum, pandas.Series.kurtosis])
Případně použijeme zkratku pro základní analýzu:
g = data.groupby('month')
g.describe()
A perlička nakonec – agregovat se dá i podle sloupců, které nejsou v tabulce. Následující kód rozloží data na slabé, průměrné a silné měsíce podle toho, kolik jsme v daném měsíci vydělali celých tisícikorun, a zjistí celkový zisk ze slabých, průměrných a silných měsíců:
bin_size = 10000
by_month = data.groupby('month').sum()
by_thousands = by_month.groupby(by_month['sales'] // bin_size * bin_size).agg(['count', 'sum'])
by_thousands
by_thousands[('sales', 'sum')].plot()
{ "data": { "sessionMaterial": { "id": "session-material:2018/mipyt-zima:pandas:1", "title": "Pandas", "html": "\n \n \n\n <div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Na dnešní lekci si do virtuálního prostředí nainstalujte následující balíčky.\nMůžete použít prostředí z lekce o NumPy.</p>\n<div class=\"highlight\"><pre><span></span><span class=\"gp\">$ </span>python -m pip install --upgrade pip\n<span class=\"gp\">$ </span>python -m pip install notebook pandas matplotlib\n</pre></div><p>Pro případ, že by vaše verze <code>pip</code>-u neuměla <em>wheels</em> nebo na PyPI nebyly příslušné <em>wheel</em> balíčky, je dobré mít na systému nainstalovaný překladač C a Fortranu (např. <code>gcc</code>, <code>gcc-gfortran</code>) a hlavičkové soubory Pythonu (např. <code>python3-devel</code>). Jestli je ale nemáte, zkuste instalaci přímo – <em>wheels</em> pro většinu operačních systémů existují – a až kdyby to nefungovalo, instalujte překladače a hlavičky.</p>\n<p>Mezitím co se instaluje, stáhněte si do adresáře <code>static</code> potřebné soubory:\n<a href=\"/2018/mipyt-zima/intro/pandas/static/actors.csv\">actors.csv</a> a\n<a href=\"/2018/mipyt-zima/intro/pandas/static/spouses.csv\">spouses.csv</a>.</p>\n<p>A až bude nainstalováno, spusťte si nový Notebook. (Viz <a href=\"/2018/mipyt-zima/intro/notebook/\">lekce o Notebooku</a>.)</p>\n<hr>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h1>Analýza dat v Pythonu</h1>\n<p>Jedna z oblastí, kde popularita Pythonu neustále roste, je analýza dat. Co tenhle termín znamená?</p>\n<p>Máme nějaká data; je jich moc a jsou nepřehledná. Datový analytik je zpracuje, přeskládá, najde v nich smysl, vytvoří shrnutí toho nejdůležitějšího nebo barevnou infografiku.</p>\n<p>Ze statistických údajů o obyvatelstvu zjistíme, jak souvisí příjmy s dostupností škol. Zpracováním měření z fyzikálního experimentu ověříme, jestli platí hypotéza. Z log přístupů na webovou službu určíme, co uživatelé čtou a kde stránky opouštějí.</p>\n<p>Na podobné úkoly je možné použít jazyky vyvinuté přímo pro analýzu dat, jako R, které takovým úkolům svojí syntaxí a filozofií odpovídají víc. Python jako obecný programovací jazyk sice místy vyžaduje krkolomnější zápis, ale zato nabízí možnost data spojit s jinými oblastmi – od získávání informací z webových stránek po tvoření webových či desktopových rozhraní.</p>\n<h2>Proces analýzy dat</h2>\n<p>Práce datového analytika se většinou drží následujícího postupu:</p>\n<ul>\n<li>Formulace otázky, kterou chceme zodpovědět</li>\n<li>Identifikace dat, která můžeme použít</li>\n<li>Získání dat (stažení, převod do použitelného formátu)</li>\n<li>Uložení dat</li>\n<li>Zkoumání dat</li>\n<li>Publikace výsledků</li>\n</ul>\n<p><small>*(založeno na diagramu z knihy *Data Wrangling in Python* od Jacqueline Kazil & Katharine Jarmul, str. 3)*</small></p>\n<p>S prvními dvěma kroky Python příliš nepomůže; k těm jen poznamenám, že „Co zajímavého se z těch dat dá vyčíst?” je validní otázka. Na druhé dva kroky se dá s úspěchem použít pythonní standardní knihovna: <code>json</code>, <code>csv</code>, případně doinstalovat <code>requests</code>, <code>lxml</code> pro XML či <code>xlwt</code>/<code>openpyxl</code> na excelové soubory.</p>\n<p>Na zkoumání dat a přípravu výsledků pak použijeme specializovanou „datovou” knihovnu – Pandas.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h1>Pandas</h1>\n<p>Pandas slouží pro analýzu dat, které lze reprezentovat 2D tabulkou. Tento „tvar” dat najdeme v SQL databázích, souborech CSV nebo tabulkových procesorech. Stručně řečeno, co jde dělat v Excelu, jde dělat i v Pandas. (Pandas má samozřejmě funkce navíc, a hlavně umožňuje analýzu automatizovat.)</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Jak bylo řečeno u <a href=\"/2018/mipyt-zima/intro/numpy/\">NumPy</a>, analytici – cílová skupina této knihovny – mají rádi zkratky. Ve spoustě materiálů na Webu proto najdete <code>import pandas as pd</code>, případně rovnou (a bez vysvětlení) použité <code>pd</code> jako zkratku pro <code>pandas</code>. Tento návod ale používá plné jméno.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [1]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"kn\">import</span> <span class=\"nn\">pandas</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Tabulky</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Základní datový typ, který Pandas nabízí, je <code>DataFrame</code>, neboli lidově „tabulka”. Jednotlivé záznamy jsou v ní uvedeny jako řádky a části těchto záznamů jsou úhledně srovnány ve sloupcích.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Nejpoužívanější způsob, jak naplnit první DataFrame, je načtení ze souboru. Na to má Pandas sadu funkcí začínající <code>read_</code>. (Některé z nich potřebují další knihovny, viz dokumentace.)</p>\n<p>Jeden z nejpříjemnějších formátů je CSV:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [2]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">read_csv</span><span class=\"p\">(</span><span class=\"s1\">'static/actors.csv'</span><span class=\"p\">,</span> <span class=\"n\">index_col</span><span class=\"o\">=</span><span class=\"kc\">None</span><span class=\"p\">)</span>\n<span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[2]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Případně lze tabulku vytvořit ze seznamu seznamů:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [3]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">items</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">([</span>\n <span class=\"p\">[</span><span class=\"s2\">"Book"</span><span class=\"p\">,</span> <span class=\"mi\">123</span><span class=\"p\">],</span>\n <span class=\"p\">[</span><span class=\"s2\">"Computer"</span><span class=\"p\">,</span> <span class=\"mi\">2185</span><span class=\"p\">],</span>\n<span class=\"p\">])</span>\n<span class=\"n\">items</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[3]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n <th>1</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Book</td>\n <td>123</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Computer</td>\n <td>2185</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>…nebo seznamu slovníků:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [4]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">items</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">([</span>\n <span class=\"p\">{</span><span class=\"s2\">"name"</span><span class=\"p\">:</span> <span class=\"s2\">"Book"</span><span class=\"p\">,</span> <span class=\"s2\">"price"</span><span class=\"p\">:</span> <span class=\"mi\">123</span><span class=\"p\">},</span>\n <span class=\"p\">{</span><span class=\"s2\">"name"</span><span class=\"p\">:</span> <span class=\"s2\">"Computer"</span><span class=\"p\">,</span> <span class=\"s2\">"price"</span><span class=\"p\">:</span> <span class=\"mi\">2185</span><span class=\"p\">},</span>\n<span class=\"p\">])</span>\n<span class=\"n\">items</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[4]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>price</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Book</td>\n <td>123</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Computer</td>\n <td>2185</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>V Jupyter Notebooku se tabulka vykreslí „graficky”.\nV konzoli se vypíše textově, ale data v ní jsou stejná:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [5]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"n\">actors</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre> name birth alive\n0 Terry 1942 True\n1 Michael 1943 True\n2 Eric 1943 True\n3 Graham 1941 False\n4 Terry 1940 True\n5 John 1939 True\n</pre>\n</div>\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Základní informace o tabulce se dají získat metodou <code>info</code>:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [6]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">info</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre><class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 3 columns):\nname 6 non-null object\nbirth 6 non-null int64\nalive 6 non-null bool\ndtypes: bool(1), int64(1), object(1)\nmemory usage: 182.0+ bytes\n</pre>\n</div>\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Vidíme, že je to tabulka (<code>DataFrame</code>), má 6 řádků indexovaných\n(pomocí automaticky vygenerovaného indexu) od 0 do 5\na 3 sloupce: jeden s objekty, jeden s <code>int64</code> a jeden s <code>bool</code>.</p>\n<p>Tyto datové typy (<code>dtypes</code>) se doplnily automaticky podle zadaných\nhodnot. Pandas je používá hlavně pro šetření pamětí: pythonní objekt\ntypu <code>bool</code> zabírá v paměti desítky bytů, ale v <code>bool</code> sloupci\nsi každá hodnota vystačí s jedním bytem.</p>\n<p>Na rozdíl od NumPy jsou typy dynamické: když do sloupce zapíšeme „nekompatibilní”\nhodnotu, kterou Pandas neumí převést na daný typ, typ sloupce\nse automaticky zobecní.\nNěkteré automatické převody ovšem nemusí být úplně intuitivní, např. <code>None</code> na <code>NaN</code>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Sloupce</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Sloupec, neboli <code>Series</code>, je druhý základní datový typ v Pandas. Obsahuje sérii hodnot, jako seznam, ale navíc má jméno, datový typ a „index”, který jednotlivé hodnoty pojmenovává. Sloupce se dají získat vybráním z tabulky:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [7]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span> <span class=\"o\">=</span> <span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'birth'</span><span class=\"p\">]</span>\n<span class=\"n\">birth_years</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[7]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 1942\n1 1943\n2 1943\n3 1941\n4 1940\n5 1939\nName: birth, dtype: int64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [8]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"nb\">type</span><span class=\"p\">(</span><span class=\"n\">birth_years</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[8]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>pandas.core.series.Series</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [9]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">name</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[9]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>'birth'</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [10]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">index</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[10]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>RangeIndex(start=0, stop=6, step=1)</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [11]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">dtype</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[11]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>dtype('int64')</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>S informacemi ve sloupcích se dá počítat.\nZákladní aritmetické operace (jako sčítání či dělení) se sloupcem a skalární hodnotou (číslem, řetězcem, ...) provedou danou operaci nad každou hodnotou ve sloupci. Výsledek je nový sloupec:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [12]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">ages</span> <span class=\"o\">=</span> <span class=\"mi\">2016</span> <span class=\"o\">-</span> <span class=\"n\">birth_years</span>\n<span class=\"n\">ages</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[12]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 74\n1 73\n2 73\n3 75\n4 76\n5 77\nName: birth, dtype: int64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [13]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">century</span> <span class=\"o\">=</span> <span class=\"n\">birth_years</span> <span class=\"o\">//</span> <span class=\"mi\">100</span> <span class=\"o\">+</span> <span class=\"mi\">1</span>\n<span class=\"n\">century</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[13]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 20\n1 20\n2 20\n3 20\n4 20\n5 20\nName: birth, dtype: int64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>To platí jak pro aritmetické operace (<code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>//</code>, <code>%</code>, <code>**</code>), tak pro porovnávání:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [14]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span> <span class=\"o\">></span> <span class=\"mi\">1940</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[14]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 True\n1 True\n2 True\n3 True\n4 False\n5 False\nName: birth, dtype: bool</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [15]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span> <span class=\"o\">==</span> <span class=\"mi\">1940</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[15]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 False\n1 False\n2 False\n3 False\n4 True\n5 False\nName: birth, dtype: bool</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Když sloupec nesečteme se skalární hodnotou (číslem) ale sekvencí, např. seznamem nebo dalším sloupcem, operace se provede na odpovídajících prvcích. Sloupec a druhá sekvence musí mít stejnou délku.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [16]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">]</span> <span class=\"o\">+</span> <span class=\"p\">[</span><span class=\"s1\">' (1)'</span><span class=\"p\">,</span> <span class=\"s1\">' (2)'</span><span class=\"p\">,</span> <span class=\"s1\">' (3)'</span><span class=\"p\">,</span> <span class=\"s1\">' (4)'</span><span class=\"p\">,</span> <span class=\"s1\">' (5)'</span><span class=\"p\">,</span> <span class=\"s1\">' (6)'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[16]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 Terry (1)\n1 Michael (2)\n2 Eric (3)\n3 Graham (4)\n4 Terry (5)\n5 John (6)\nName: name, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Řetězcové operace se u řetězcových sloupců schovávají pod jmenným prostorem <code>str</code>:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [17]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">str</span><span class=\"o\">.</span><span class=\"n\">upper</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[17]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 TERRY\n1 MICHAEL\n2 ERIC\n3 GRAHAM\n4 TERRY\n5 JOHN\nName: name, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>... a operace s daty a časy (<em>datetime</em>) najdeme pod <code>dt</code>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Ze slupců jdou vybírat prvky či podsekvence podobně jako třeba ze seznamů:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [18]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[18]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>1943</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [19]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">birth_years</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">2</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[19]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>2 1943\n3 1941\nName: birth, dtype: int64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>A navíc je lze vybírat pomocí sloupce typu <code>bool</code>, což vybere ty záznamy, u kterých je odpovídající hodnota <em>true</em>. Tak lze rychle vybrat hodnoty, které odpovídají nějaké podmínce:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [20]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"c1\"># Roky narození po roce 1940</span>\n<span class=\"n\">birth_years</span><span class=\"p\">[</span><span class=\"n\">birth_years</span> <span class=\"o\">></span> <span class=\"mi\">1940</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[20]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 1942\n1 1943\n2 1943\n3 1941\nName: birth, dtype: int64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Protože Python neumožňuje předefinovat chování operátorů <code>and</code> a <code>or</code>, logické spojení operací se tradičně dělá přes bitové operátory <code>&</code> (a) a <code>|</code> (nebo). Ty mají ale neintuitivní prioritu, proto se jednotlivé výrazy hodí uzavřít do závorek:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [21]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"c1\"># Roky narození v daném rozmezí</span>\n<span class=\"n\">birth_years</span><span class=\"p\">[(</span><span class=\"n\">birth_years</span> <span class=\"o\">></span> <span class=\"mi\">1940</span><span class=\"p\">)</span> <span class=\"o\">&</span> <span class=\"p\">(</span><span class=\"n\">birth_years</span> <span class=\"o\"><</span> <span class=\"mi\">1943</span><span class=\"p\">)]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[21]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 1942\n3 1941\nName: birth, dtype: int64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Sloupce mají zabudovanou celou řadu operací, od základních (např. <code>column.sum()</code>, která bývá rychlejší než vestavěná funkce <code>sum()</code>) po roztodivné statistické specialitky. Kompletní seznam hledejte v <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html\">dokumentaci</a>. Povědomí o operacích, které sloupce umožňují, je základní znalost datového analytika.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [22]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"s1\">'Součet: '</span><span class=\"p\">,</span> <span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">sum</span><span class=\"p\">())</span>\n<span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"s1\">'Průměr: '</span><span class=\"p\">,</span> <span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">mean</span><span class=\"p\">())</span>\n<span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"s1\">'Medián: '</span><span class=\"p\">,</span> <span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">median</span><span class=\"p\">())</span>\n<span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"s1\">'Počet unikátních hodnot: '</span><span class=\"p\">,</span> <span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">nunique</span><span class=\"p\">())</span>\n<span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"s1\">'Koeficient špičatosti: '</span><span class=\"p\">,</span> <span class=\"n\">birth_years</span><span class=\"o\">.</span><span class=\"n\">kurtosis</span><span class=\"p\">())</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>Součet: 11648\nPrůměr: 1941.3333333333333\nMedián: 1941.5\nPočet unikátních hodnot: 5\nKoeficient špičatosti: -1.4812500000001654\n</pre>\n</div>\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Zvláště mocná je metoda <code>apply</code>, která nám dovoluje aplikovat jakoukoli funkci na všechny hodnoty sloupce:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [23]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">apply</span><span class=\"p\">(</span><span class=\"k\">lambda</span> <span class=\"n\">x</span><span class=\"p\">:</span> <span class=\"s1\">''</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"nb\">reversed</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">)))</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[23]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 yrreT\n1 leahciM\n2 cirE\n3 maharG\n4 yrreT\n5 nhoJ\nName: name, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [24]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'alive'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">apply</span><span class=\"p\">({</span><span class=\"kc\">True</span><span class=\"p\">:</span> <span class=\"s1\">'alive'</span><span class=\"p\">,</span> <span class=\"kc\">False</span><span class=\"p\">:</span> <span class=\"s1\">'deceased'</span><span class=\"p\">}</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[24]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 alive\n1 alive\n2 alive\n3 deceased\n4 alive\n5 alive\nName: alive, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Tabulky a vybírání prvků</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Prvky ze sloupců jdou vybírat jako u seznamů. Ale z tabulek v Pandas jde vybírat spoustou různých způsobů. Tradiční hranaté závorky plní několik funkcí najednou, takže někdy není na první pohled jasné, co jaké indexování znamená:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [25]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">]</span> <span class=\"c1\"># Jméno sloupce</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[25]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 Terry\n1 Michael\n2 Eric\n3 Graham\n4 Terry\n5 John\nName: name, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [26]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"mi\">1</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"c1\"># Interval řádků</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[26]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [27]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"p\">[[</span><span class=\"s1\">'name'</span><span class=\"p\">,</span> <span class=\"s1\">'alive'</span><span class=\"p\">]]</span> <span class=\"c1\"># Seznam sloupců</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[27]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Toto je příklad nejednoznačného chování, které zjednodušuje život datovým analytikům, pro které je knihovna Pandas primárně určena.</p>\n<p>My, coby programátoři píšící robustní kód, budeme čisté indexování (<code>[]</code>) používat <em>jen</em> pro výběr sloupců podle jména.\nPro ostatní přístup použijeme tzv. <em>indexery</em>, jako <code>loc</code> a <code>iloc</code>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3>Indexer <code>loc</code></h3>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Indexer <code>loc</code> zprostředkovává primárně <em>řádky</em>, a to podle <em>indexu</em>, tedy hlaviček tabulky. V našem příkladu jsou řádky očíslované a sloupce pojmenované, ale dále uvidíme, že v obou indexech můžou být jakékoli hodnoty.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [28]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[28]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [29]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[29]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>name Eric\nbirth 1943\nalive True\nName: 2, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Všimněte si, že <code>loc</code> není metoda: používají se s ním hranaté závorky.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Použijeme-li k indexování <em>n</em>-tici, prvním prvkem se indexují řádky a druhým sloupce – podobně jako u NumPy:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [30]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">,</span> <span class=\"s1\">'birth'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[30]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>1943</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Na obou pozicích může být „interval”, ale na rozdíl od klasického Pythonu jsou ve výsledku obsaženy <em>obě koncové hodnoty</em>. (S indexem, který nemusí být vždy číselný, to dává smysl.)</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [31]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">:</span><span class=\"mi\">4</span><span class=\"p\">,</span> <span class=\"s1\">'birth'</span><span class=\"p\">:</span><span class=\"s1\">'alive'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[31]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1940</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Když uvedeme jen jednu hodnotu, sníží se dimenzionalita – z tabulky na sloupec (případně řádek – taky Series), ze sloupce na skalární hodnotu. Porovnejte:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [32]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">:</span><span class=\"mi\">4</span><span class=\"p\">,</span> <span class=\"s1\">'name'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[32]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>2 Eric\n3 Graham\n4 Terry\nName: name, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [33]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"mi\">2</span><span class=\"p\">:</span><span class=\"mi\">4</span><span class=\"p\">,</span> <span class=\"s1\">'name'</span><span class=\"p\">:</span><span class=\"s1\">'name'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[33]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>Eric</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Chcete-li vybrat sloupec, na místě řádků uveďte dvojtečku – t.j. kompletní interval.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [34]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[:,</span> <span class=\"s1\">'alive'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[34]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>0 True\n1 True\n2 True\n3 False\n4 True\n5 True\nName: alive, dtype: bool</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Další možnost indexování je seznamem hodnot. Tím se dají řádky či sloupce vybírat, přeskupovat, nebo i duplikovat:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [35]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[:,</span> <span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">,</span> <span class=\"s1\">'alive'</span><span class=\"p\">]]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[35]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [36]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[[</span><span class=\"mi\">3</span><span class=\"p\">,</span> <span class=\"mi\">2</span><span class=\"p\">,</span> <span class=\"mi\">4</span><span class=\"p\">,</span> <span class=\"mi\">4</span><span class=\"p\">],</span> <span class=\"p\">:]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[36]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3>Indexer <code>iloc</code></h3>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Druhý indexer, který si v krátkosti ukážeme, je <code>iloc</code>. Umí to samé co <code>loc</code>, jen nepracuje s klíčem, ale s pozicemi řádků či sloupců. Funguje tedy jako indexování v NumPy.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [37]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[37]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [38]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">iloc</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"mi\">0</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[38]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>'Terry'</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Protože <code>iloc</code> pracuje s čísly, záporná čísla a intervaly fungují jako ve standardním Pythonu:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [39]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">iloc</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"mi\">1</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[39]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>1939</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [40]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">iloc</span><span class=\"p\">[:,</span> <span class=\"mi\">0</span><span class=\"p\">:</span><span class=\"mi\">1</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[40]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Indexování seznamem ale funguje jako u <code>loc</code>:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [41]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">iloc</span><span class=\"p\">[[</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"mi\">3</span><span class=\"p\">],</span> <span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"mi\">0</span><span class=\"p\">]]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[41]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>alive</th>\n <th>birth</th>\n <th>name</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>1942</td>\n <td>Terry</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>1939</td>\n <td>John</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>1941</td>\n <td>Graham</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Jak <code>loc</code> tak <code>iloc</code> fungují i na sloupcích (Series), takže se dají kombinovat:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [42]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">iloc</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[42]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>'John'</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Indexy</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>V minulé sekci jsme naťukli indexy – jména jednotlivých sloupců nebo řádků. Teď se podívejme, co všechno s nimi lze dělat.\nNačtěte si znovu stejnou tabulku:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [43]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">read_csv</span><span class=\"p\">(</span><span class=\"s1\">'static/actors.csv'</span><span class=\"p\">,</span> <span class=\"n\">index_col</span><span class=\"o\">=</span><span class=\"kc\">None</span><span class=\"p\">)</span>\n<span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[43]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Tato tabulka má dva klíče: jeden pro řádky, <code>index</code>, a druhý pro sloupce, který se jmenuje <code>columns</code>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [44]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">index</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[44]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>RangeIndex(start=0, stop=6, step=1)</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [45]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">columns</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[45]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>Index(['name', 'birth', 'alive'], dtype='object')</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Klíč se dá změnit tím, že do něj přiřadíme sloupec (nebo jinou sekvenci):</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [46]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">index</span> <span class=\"o\">=</span> <span class=\"n\">actors</span><span class=\"p\">[</span><span class=\"s1\">'name'</span><span class=\"p\">]</span>\n<span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[46]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n <tr>\n <th>name</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Terry</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Michael</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Eric</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Graham</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>Terry</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>John</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [47]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">index</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[47]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>Index(['Terry', 'Michael', 'Eric', 'Graham', 'Terry', 'John'], dtype='object', name='name')</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Potom jde pomocí tohoto klíče vyhledávat. Chceme-li vyhledávat efektivně (což dává smysl, pokud by řádků byly miliony), je dobré nejdřív tabulku podle indexu seřadit:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [48]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span> <span class=\"o\">=</span> <span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">sort_index</span><span class=\"p\">()</span>\n<span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[48]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n <tr>\n <th>name</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Eric</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Graham</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>John</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Michael</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Terry</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Terry</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [49]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[[</span><span class=\"s1\">'Eric'</span><span class=\"p\">,</span> <span class=\"s1\">'Graham'</span><span class=\"p\">]]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[49]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n <tr>\n <th>name</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Eric</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Graham</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Pozor ale na situaci, kdy hodnoty v klíči nejsou unikátní. To Pandas podporuje, ale chování nemusí být podle vašich představ:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [50]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Terry'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[50]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n <tr>\n <th>name</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Terry</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>Terry</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Trochu pokročilejší možnost, jak klíč nastavit, je metoda <code>set_index</code>. Nejčastěji se používá k přesunutí sloupců do klíče, ale v <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html\">dokumentaci</a> se dočtete i o dalších možnostech.\nPřesuňte teď do klíče dva sloupce najednou:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [51]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span> <span class=\"o\">=</span> <span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">set_index</span><span class=\"p\">([</span><span class=\"s1\">'name'</span><span class=\"p\">,</span> <span class=\"s1\">'birth'</span><span class=\"p\">])</span>\n<span class=\"n\">indexed_actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[51]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>alive</th>\n </tr>\n <tr>\n <th>name</th>\n <th>birth</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Eric</th>\n <th>1943</th>\n <td>True</td>\n </tr>\n <tr>\n <th>Graham</th>\n <th>1941</th>\n <td>False</td>\n </tr>\n <tr>\n <th>John</th>\n <th>1939</th>\n <td>True</td>\n </tr>\n <tr>\n <th>Michael</th>\n <th>1943</th>\n <td>True</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">Terry</th>\n <th>1942</th>\n <td>True</td>\n </tr>\n <tr>\n <th>1940</th>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Vznikl tím víceúrovňový klíč:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [52]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"o\">.</span><span class=\"n\">index</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[52]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>MultiIndex(levels=[['Eric', 'Graham', 'John', 'Michael', 'Terry'], [1939, 1940, 1941, 1942, 1943]],\n labels=[[0, 1, 2, 3, 4, 4], [4, 2, 0, 4, 3, 1]],\n names=['name', 'birth'])</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Řádky z tabulky s víceúrovňovým klíčem se dají vybírat buď postupně po jednotlivých úrovních, nebo <em>n</em>-ticí:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [53]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Terry'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[53]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>alive</th>\n </tr>\n <tr>\n <th>birth</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1942</th>\n <td>True</td>\n </tr>\n <tr>\n <th>1940</th>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [54]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Terry'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"mi\">1940</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[54]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>alive True\nName: 1940, dtype: bool</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [55]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[(</span><span class=\"s1\">'Terry'</span><span class=\"p\">,</span> <span class=\"mi\">1942</span><span class=\"p\">)]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[55]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>alive True\nName: (Terry, 1942), dtype: bool</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Kromě výběru dat mají klíče i jinou vlastnost: přidáme-li do tabulky nový sloupec s klíčem, jednotlivé řádky se seřadí podle něj:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [56]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[56]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>alive</th>\n </tr>\n <tr>\n <th>name</th>\n <th>birth</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Eric</th>\n <th>1943</th>\n <td>True</td>\n </tr>\n <tr>\n <th>Graham</th>\n <th>1941</th>\n <td>False</td>\n </tr>\n <tr>\n <th>John</th>\n <th>1939</th>\n <td>True</td>\n </tr>\n <tr>\n <th>Michael</th>\n <th>1943</th>\n <td>True</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">Terry</th>\n <th>1942</th>\n <td>True</td>\n </tr>\n <tr>\n <th>1940</th>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [57]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">last_names</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">Series</span><span class=\"p\">([</span><span class=\"s1\">'Gilliam'</span><span class=\"p\">,</span> <span class=\"s1\">'Jones'</span><span class=\"p\">,</span> <span class=\"s1\">'Cleveland'</span><span class=\"p\">],</span>\n <span class=\"n\">index</span><span class=\"o\">=</span><span class=\"p\">[(</span><span class=\"s1\">'Terry'</span><span class=\"p\">,</span> <span class=\"mi\">1940</span><span class=\"p\">),</span> <span class=\"p\">(</span><span class=\"s1\">'Terry'</span><span class=\"p\">,</span> <span class=\"mi\">1942</span><span class=\"p\">),</span> <span class=\"p\">(</span><span class=\"s1\">'Carol'</span><span class=\"p\">,</span> <span class=\"mi\">1942</span><span class=\"p\">)])</span>\n<span class=\"n\">last_names</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[57]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>(Terry, 1940) Gilliam\n(Terry, 1942) Jones\n(Carol, 1942) Cleveland\ndtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [58]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"p\">[</span><span class=\"s1\">'last_name'</span><span class=\"p\">]</span> <span class=\"o\">=</span> <span class=\"n\">last_names</span>\n<span class=\"n\">indexed_actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[58]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>alive</th>\n <th>last_name</th>\n </tr>\n <tr>\n <th>name</th>\n <th>birth</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Eric</th>\n <th>1943</th>\n <td>True</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>Graham</th>\n <th>1941</th>\n <td>False</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>John</th>\n <th>1939</th>\n <td>True</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>Michael</th>\n <th>1943</th>\n <td>True</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">Terry</th>\n <th>1942</th>\n <td>True</td>\n <td>Jones</td>\n </tr>\n <tr>\n <th>1940</th>\n <td>True</td>\n <td>Gilliam</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>NaN neboli NULL či N/A</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>V posledním příkladu vidíme, že Pandas doplňuje za neznámé hodnoty <code>NaN</code>, tedy "Not a Number" – hodnotu, která plní podobnou funkci jako <code>NULL</code> v SQL nebo <code>None</code> v Pythonu. Znamená, že daná informace chybí, není k dispozici nebo ani nedává smysl ji mít. Naprostá většina operací s <code>NaN</code> dává opět <code>NaN</code>:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [59]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"s1\">'('</span> <span class=\"o\">+</span> <span class=\"n\">indexed_actors</span><span class=\"p\">[</span><span class=\"s1\">'last_name'</span><span class=\"p\">]</span> <span class=\"o\">+</span> <span class=\"s1\">')'</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[59]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>name birth\nEric 1943 NaN\nGraham 1941 NaN\nJohn 1939 NaN\nMichael 1943 NaN\nTerry 1942 (Jones)\n 1940 (Gilliam)\nName: last_name, dtype: object</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>NaN se chová divně i při porovnávání; <code>(NaN == NaN)</code> je nepravda. Pro zjištění chybějících hodnot máme metodu <code>isnull()</code>:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [60]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"p\">[</span><span class=\"s1\">'last_name'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">isnull</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[60]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>name birth\nEric 1943 True\nGraham 1941 True\nJohn 1939 True\nMichael 1943 True\nTerry 1942 False\n 1940 False\nName: last_name, dtype: bool</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Abychom se <code>NaN</code> zbavili, máme dvě možnosti. Buď je zaplníme pomocí metody <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html\"><code>fillna</code></a> hodnotou jako <code>0</code>, <code>False</code> nebo, pro přehlednější výpis, prázdným řetězcem:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [61]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"o\">.</span><span class=\"n\">fillna</span><span class=\"p\">(</span><span class=\"s1\">''</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[61]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>alive</th>\n <th>last_name</th>\n </tr>\n <tr>\n <th>name</th>\n <th>birth</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Eric</th>\n <th>1943</th>\n <td>True</td>\n <td></td>\n </tr>\n <tr>\n <th>Graham</th>\n <th>1941</th>\n <td>False</td>\n <td></td>\n </tr>\n <tr>\n <th>John</th>\n <th>1939</th>\n <td>True</td>\n <td></td>\n </tr>\n <tr>\n <th>Michael</th>\n <th>1943</th>\n <td>True</td>\n <td></td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">Terry</th>\n <th>1942</th>\n <td>True</td>\n <td>Jones</td>\n </tr>\n <tr>\n <th>1940</th>\n <td>True</td>\n <td>Gilliam</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Nebo se můžeme zbavit všech řádků, které nějaký <code>NaN</code> obsahují:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [62]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed_actors</span><span class=\"o\">.</span><span class=\"n\">dropna</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[62]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>alive</th>\n <th>last_name</th>\n </tr>\n <tr>\n <th>name</th>\n <th>birth</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th rowspan=\"2\" valign=\"top\">Terry</th>\n <th>1942</th>\n <td>True</td>\n <td>Jones</td>\n </tr>\n <tr>\n <th>1940</th>\n <td>True</td>\n <td>Gilliam</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Bohužel existuje jistá nekonzistence mezi <code>NaN</code> a slovy <code>null</code> či <code>na</code> v názvech funkcí. <em>C'est la vie.</em></p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Merge</h2>\n<p>Někdy se stane, že máme více souvisejících tabulek, které je potřeba spojit dohromady. Na to mají <code>DataFrame</code> metodu <code>merge()</code>, která umí podobné operace jako <code>JOIN</code> v SQL.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [63]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">read_csv</span><span class=\"p\">(</span><span class=\"s1\">'static/actors.csv'</span><span class=\"p\">,</span> <span class=\"n\">index_col</span><span class=\"o\">=</span><span class=\"kc\">None</span><span class=\"p\">)</span>\n<span class=\"n\">actors</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[63]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [64]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">spouses</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">read_csv</span><span class=\"p\">(</span><span class=\"s1\">'static/spouses.csv'</span><span class=\"p\">,</span> <span class=\"n\">index_col</span><span class=\"o\">=</span><span class=\"kc\">None</span><span class=\"p\">)</span>\n<span class=\"n\">spouses</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[64]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>spouse_name</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Graham</td>\n <td>1941</td>\n <td>David Sherlock</td>\n </tr>\n <tr>\n <th>1</th>\n <td>John</td>\n <td>1939</td>\n <td>Connie Booth</td>\n </tr>\n <tr>\n <th>2</th>\n <td>John</td>\n <td>1939</td>\n <td>Barbara Trentham</td>\n </tr>\n <tr>\n <th>3</th>\n <td>John</td>\n <td>1939</td>\n <td>Alyce Eichelberger</td>\n </tr>\n <tr>\n <th>4</th>\n <td>John</td>\n <td>1939</td>\n <td>Jennifer Wade</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Terry</td>\n <td>1940</td>\n <td>Maggie Westo</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Eric</td>\n <td>1943</td>\n <td>Lyn Ashley</td>\n </tr>\n <tr>\n <th>7</th>\n <td>Eric</td>\n <td>1943</td>\n <td>Tania Kosevich</td>\n </tr>\n <tr>\n <th>8</th>\n <td>Terry</td>\n <td>1942</td>\n <td>Alison Telfer</td>\n </tr>\n <tr>\n <th>9</th>\n <td>Terry</td>\n <td>1942</td>\n <td>Anna Söderström</td>\n </tr>\n <tr>\n <th>10</th>\n <td>Michael</td>\n <td>1943</td>\n <td>Helen Gibbins</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [65]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">actors</span><span class=\"o\">.</span><span class=\"n\">merge</span><span class=\"p\">(</span><span class=\"n\">spouses</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[65]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>name</th>\n <th>birth</th>\n <th>alive</th>\n <th>spouse_name</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n <td>Alison Telfer</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Terry</td>\n <td>1942</td>\n <td>True</td>\n <td>Anna Söderström</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Michael</td>\n <td>1943</td>\n <td>True</td>\n <td>Helen Gibbins</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n <td>Lyn Ashley</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Eric</td>\n <td>1943</td>\n <td>True</td>\n <td>Tania Kosevich</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Graham</td>\n <td>1941</td>\n <td>False</td>\n <td>David Sherlock</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Terry</td>\n <td>1940</td>\n <td>True</td>\n <td>Maggie Westo</td>\n </tr>\n <tr>\n <th>7</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n <td>Connie Booth</td>\n </tr>\n <tr>\n <th>8</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n <td>Barbara Trentham</td>\n </tr>\n <tr>\n <th>9</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n <td>Alyce Eichelberger</td>\n </tr>\n <tr>\n <th>10</th>\n <td>John</td>\n <td>1939</td>\n <td>True</td>\n <td>Jennifer Wade</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Mají-li spojované tabulky sloupce stejných jmen, Pandas je spojí podle těchto sloupců. V <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html\">dokumentaci</a> se dá zjistit, jak explicitně určit podle kterých klíčů spojovat, co udělat když v jedné z tabulek chybí odpovídající hodnoty apod.</p>\n<p>Fanoušky SQL ještě odkážu na <a href=\"http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html\">porovnání mezi SQL a Pandas</a>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Přesýpání dat</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Dostáváme se do bodu, kdy nám jednoduchá tabulka přestává stačit. Pojďme si vytvořit tabulku větší: fiktivních prodejů v e-shopu, ve formátu jaký bychom mohli dostat z SQL databáze nebo datového souboru.</p>\n<p>Použijeme k tomu mimo jiné <code>date_range</code>, která vytváří kalendářní intervaly. Zde, i v jiných případech, kdy je jasné, že se má nějaká hodnota interpretovat jako datum, nám Pandas dovolí místo objektů <code>datetime</code> zadávat data řetězcem:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [66]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"kn\">import</span> <span class=\"nn\">itertools</span>\n<span class=\"kn\">import</span> <span class=\"nn\">random</span>\n<span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">seed</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">)</span>\n\n<span class=\"n\">months</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">date_range</span><span class=\"p\">(</span><span class=\"s1\">'2015-01'</span><span class=\"p\">,</span> <span class=\"s1\">'2016-12'</span><span class=\"p\">,</span> <span class=\"n\">freq</span><span class=\"o\">=</span><span class=\"s1\">'M'</span><span class=\"p\">)</span>\n<span class=\"n\">categories</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"s1\">'Electronics'</span><span class=\"p\">,</span> <span class=\"s1\">'Power Tools'</span><span class=\"p\">,</span> <span class=\"s1\">'Clothing'</span><span class=\"p\">]</span>\n<span class=\"n\">data</span> <span class=\"o\">=</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">([{</span><span class=\"s1\">'month'</span><span class=\"p\">:</span> <span class=\"n\">a</span><span class=\"p\">,</span> <span class=\"s1\">'category'</span><span class=\"p\">:</span> <span class=\"n\">b</span><span class=\"p\">,</span> <span class=\"s1\">'sales'</span><span class=\"p\">:</span> <span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">randint</span><span class=\"p\">(</span><span class=\"o\">-</span><span class=\"mi\">1000</span><span class=\"p\">,</span> <span class=\"mi\">10000</span><span class=\"p\">)}</span>\n <span class=\"k\">for</span> <span class=\"n\">a</span><span class=\"p\">,</span> <span class=\"n\">b</span> <span class=\"ow\">in</span> <span class=\"n\">itertools</span><span class=\"o\">.</span><span class=\"n\">product</span><span class=\"p\">(</span><span class=\"n\">months</span><span class=\"p\">,</span> <span class=\"n\">categories</span><span class=\"p\">)</span>\n <span class=\"k\">if</span> <span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">randrange</span><span class=\"p\">(</span><span class=\"mi\">20</span><span class=\"p\">)</span> <span class=\"o\">></span> <span class=\"mi\">0</span><span class=\"p\">])</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Tabulka je celkem dlouhá (i když v analýze dat bývají ještě delší). Podívejme se na několik obecných informací:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [67]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"c1\"># Prvních pár řádků (dá se použít i např. head(10), bylo by jich víc)</span>\n<span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">head</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[67]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>month</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Electronics</td>\n <td>2015-01-31</td>\n <td>5890</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Power Tools</td>\n <td>2015-01-31</td>\n <td>3242</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Clothing</td>\n <td>2015-01-31</td>\n <td>6961</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Electronics</td>\n <td>2015-02-28</td>\n <td>3969</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Power Tools</td>\n <td>2015-02-28</td>\n <td>4866</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [68]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"c1\"># Celkový počet řádků</span>\n<span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">data</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[68]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>67</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [69]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"p\">[</span><span class=\"s1\">'sales'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">describe</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[69]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>count 67.000000\nmean 4795.552239\nstd 3101.026552\nmin -735.000000\n25% 2089.000000\n50% 4448.000000\n75% 7874.000000\nmax 9817.000000\nName: sales, dtype: float64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Pomocí <code>set_index</code> nastavíme, které sloupce budeme brát jako hlavičky:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [70]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">indexed</span> <span class=\"o\">=</span> <span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">set_index</span><span class=\"p\">([</span><span class=\"s1\">'category'</span><span class=\"p\">,</span> <span class=\"s1\">'month'</span><span class=\"p\">])</span>\n<span class=\"n\">indexed</span><span class=\"o\">.</span><span class=\"n\">head</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[70]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>month</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Electronics</th>\n <th>2015-01-31</th>\n <td>5890</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <th>2015-01-31</th>\n <td>3242</td>\n </tr>\n <tr>\n <th>Clothing</th>\n <th>2015-01-31</th>\n <td>6961</td>\n </tr>\n <tr>\n <th>Electronics</th>\n <th>2015-02-28</th>\n <td>3969</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <th>2015-02-28</th>\n <td>4866</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Budeme-li chtít z těchto dat vytvořit tabulku, která má v řádcích kategorie a ve sloupcích měsíce, můžeme využít metodu <code>unstack</code>, která "přesune" vnitřní úroveň indexu řádků do sloupců a uspořádá podle toho i data.</p>\n<p>Můžeme samozřejmě použít kteroukoli úroveň klíče; viz <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html\">dokumentace</a> k <code>unstack</code> a reverzní operaci <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.stack.html\"><code>stack</code></a>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [71]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">unstacked</span> <span class=\"o\">=</span> <span class=\"n\">indexed</span><span class=\"o\">.</span><span class=\"n\">unstack</span><span class=\"p\">(</span><span class=\"s1\">'month'</span><span class=\"p\">)</span>\n<span class=\"n\">unstacked</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[71]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead tr th {\n text-align: left\n }\n.lesson-content .dataframe thead tr:last-of-type th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"21\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>month</th>\n <th>2015-01-31</th>\n <th>2015-02-28</th>\n <th>2015-03-31</th>\n <th>2015-04-30</th>\n <th>2015-05-31</th>\n <th>2015-06-30</th>\n <th>2015-07-31</th>\n <th>2015-08-31</th>\n <th>2015-09-30</th>\n <th>2015-10-31</th>\n <th>...</th>\n <th>2016-02-29</th>\n <th>2016-03-31</th>\n <th>2016-04-30</th>\n <th>2016-05-31</th>\n <th>2016-06-30</th>\n <th>2016-07-31</th>\n <th>2016-08-31</th>\n <th>2016-09-30</th>\n <th>2016-10-31</th>\n <th>2016-11-30</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Clothing</th>\n <td>6961.0</td>\n <td>2578.0</td>\n <td>9131.0</td>\n <td>618.0</td>\n <td>4796.0</td>\n <td>8052.0</td>\n <td>7989.0</td>\n <td>NaN</td>\n <td>31.0</td>\n <td>7896.0</td>\n <td>...</td>\n <td>4194.0</td>\n <td>2059.0</td>\n <td>471.0</td>\n <td>5410.0</td>\n <td>8663.0</td>\n <td>9817.0</td>\n <td>6969.0</td>\n <td>-735.0</td>\n <td>4448.0</td>\n <td>-259.0</td>\n </tr>\n <tr>\n <th>Electronics</th>\n <td>5890.0</td>\n <td>3969.0</td>\n <td>1281.0</td>\n <td>7725.0</td>\n <td>4409.0</td>\n <td>4180.0</td>\n <td>6253.0</td>\n <td>NaN</td>\n <td>7086.0</td>\n <td>8298.0</td>\n <td>...</td>\n <td>6290.0</td>\n <td>2966.0</td>\n <td>9039.0</td>\n <td>1450.0</td>\n <td>3515.0</td>\n <td>8497.0</td>\n <td>349.0</td>\n <td>9324.0</td>\n <td>919.0</td>\n <td>18.0</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <td>3242.0</td>\n <td>4866.0</td>\n <td>1289.0</td>\n <td>1407.0</td>\n <td>8171.0</td>\n <td>9492.0</td>\n <td>3267.0</td>\n <td>5534.0</td>\n <td>2996.0</td>\n <td>2909.0</td>\n <td>...</td>\n <td>8769.0</td>\n <td>2012.0</td>\n <td>6807.0</td>\n <td>314.0</td>\n <td>2858.0</td>\n <td>6382.0</td>\n <td>9039.0</td>\n <td>2119.0</td>\n <td>5095.0</td>\n <td>1397.0</td>\n </tr>\n </tbody>\n</table>\n<p>3 rows × 23 columns</p>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Teď je sloupcový klíč dvouúrovňový, ale úroveň <code>sales</code> je zbytečná. Můžeme se jí zbavit pomocí <a href=\"http://pandas.pydata.org/pandas-docs/version/0.18.0/generated/pandas.MultiIndex.droplevel.html\"><code>MultiIndex.droplevel</code></a>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [72]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">columns</span> <span class=\"o\">=</span> <span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">columns</span><span class=\"o\">.</span><span class=\"n\">droplevel</span><span class=\"p\">()</span>\n<span class=\"n\">unstacked</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[72]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>month</th>\n <th>2015-01-31 00:00:00</th>\n <th>2015-02-28 00:00:00</th>\n <th>2015-03-31 00:00:00</th>\n <th>2015-04-30 00:00:00</th>\n <th>2015-05-31 00:00:00</th>\n <th>2015-06-30 00:00:00</th>\n <th>2015-07-31 00:00:00</th>\n <th>2015-08-31 00:00:00</th>\n <th>2015-09-30 00:00:00</th>\n <th>2015-10-31 00:00:00</th>\n <th>...</th>\n <th>2016-02-29 00:00:00</th>\n <th>2016-03-31 00:00:00</th>\n <th>2016-04-30 00:00:00</th>\n <th>2016-05-31 00:00:00</th>\n <th>2016-06-30 00:00:00</th>\n <th>2016-07-31 00:00:00</th>\n <th>2016-08-31 00:00:00</th>\n <th>2016-09-30 00:00:00</th>\n <th>2016-10-31 00:00:00</th>\n <th>2016-11-30 00:00:00</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Clothing</th>\n <td>6961.0</td>\n <td>2578.0</td>\n <td>9131.0</td>\n <td>618.0</td>\n <td>4796.0</td>\n <td>8052.0</td>\n <td>7989.0</td>\n <td>NaN</td>\n <td>31.0</td>\n <td>7896.0</td>\n <td>...</td>\n <td>4194.0</td>\n <td>2059.0</td>\n <td>471.0</td>\n <td>5410.0</td>\n <td>8663.0</td>\n <td>9817.0</td>\n <td>6969.0</td>\n <td>-735.0</td>\n <td>4448.0</td>\n <td>-259.0</td>\n </tr>\n <tr>\n <th>Electronics</th>\n <td>5890.0</td>\n <td>3969.0</td>\n <td>1281.0</td>\n <td>7725.0</td>\n <td>4409.0</td>\n <td>4180.0</td>\n <td>6253.0</td>\n <td>NaN</td>\n <td>7086.0</td>\n <td>8298.0</td>\n <td>...</td>\n <td>6290.0</td>\n <td>2966.0</td>\n <td>9039.0</td>\n <td>1450.0</td>\n <td>3515.0</td>\n <td>8497.0</td>\n <td>349.0</td>\n <td>9324.0</td>\n <td>919.0</td>\n <td>18.0</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <td>3242.0</td>\n <td>4866.0</td>\n <td>1289.0</td>\n <td>1407.0</td>\n <td>8171.0</td>\n <td>9492.0</td>\n <td>3267.0</td>\n <td>5534.0</td>\n <td>2996.0</td>\n <td>2909.0</td>\n <td>...</td>\n <td>8769.0</td>\n <td>2012.0</td>\n <td>6807.0</td>\n <td>314.0</td>\n <td>2858.0</td>\n <td>6382.0</td>\n <td>9039.0</td>\n <td>2119.0</td>\n <td>5095.0</td>\n <td>1397.0</td>\n </tr>\n </tbody>\n</table>\n<p>3 rows × 23 columns</p>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>A teď můžeme data analyzovat. Kolik se celkem utratilo za elektroniku?</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [73]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Electronics'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">sum</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[73]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>103742.0</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Jak to vypadalo se všemi elektrickými zařízeními v třech konkrétních měsících?</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [74]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[[</span><span class=\"s1\">'Electronics'</span><span class=\"p\">,</span> <span class=\"s1\">'Power Tools'</span><span class=\"p\">],</span> <span class=\"s1\">'2016-03'</span><span class=\"p\">:</span><span class=\"s1\">'2016-05'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[74]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>month</th>\n <th>2016-03-31 00:00:00</th>\n <th>2016-04-30 00:00:00</th>\n <th>2016-05-31 00:00:00</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Electronics</th>\n <td>2966.0</td>\n <td>9039.0</td>\n <td>1450.0</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <td>2012.0</td>\n <td>6807.0</td>\n <td>314.0</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>A jak se prodávalo oblečení?</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [75]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Clothing'</span><span class=\"p\">]</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[75]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>month\n2015-01-31 6961.0\n2015-02-28 2578.0\n2015-03-31 9131.0\n2015-04-30 618.0\n2015-05-31 4796.0\n2015-06-30 8052.0\n2015-07-31 7989.0\n2015-08-31 NaN\n2015-09-30 31.0\n2015-10-31 7896.0\n2015-11-30 7016.0\n2015-12-31 7969.0\n2016-01-31 8627.0\n2016-02-29 4194.0\n2016-03-31 2059.0\n2016-04-30 471.0\n2016-05-31 5410.0\n2016-06-30 8663.0\n2016-07-31 9817.0\n2016-08-31 6969.0\n2016-09-30 -735.0\n2016-10-31 4448.0\n2016-11-30 -259.0\nName: Clothing, dtype: float64</pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Metody <code>stack</code> a <code>unstack</code> jsou sice asi nejužitečnější, ale stále jen jeden ze způsobů jak v Pandas tabulky přeskládávat. Náročnější studenti najdou další možnosti v <a href=\"http://pandas.pydata.org/pandas-docs/stable/reshaping.html\">dokumentaci</a>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Grafy</h2>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Je-li nainstalována knihovna <code>matplotlib</code>, Pandas ji umí využít k tomu, aby kreslil grafy. Nastavení je trochu jiné pro Jupyter Notebook a pro příkazovou řádku.</p>\n<p>Používáte-li Jupyter Notebook, zapněte integraci pro kreslení grafů pomocí:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [76]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"kn\">import</span> <span class=\"nn\">matplotlib</span>\n\n<span class=\"c1\"># Zapnout zobrazování grafů (procento uvozuje „magickou” zkratku IPythonu):</span>\n<span class=\"o\">%</span><span class=\"k\">matplotlib</span> inline\n</pre></div>\n\n</div>\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>a pak můžete přímo použít metodu <code>plot()</code>, která bez dalších argumentů vynese data z tabulky proti indexu:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [77]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Clothing'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">dropna</span><span class=\"p\">()</span><span class=\"o\">.</span><span class=\"n\">plot</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[77]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre><matplotlib.axes._subplots.AxesSubplot at 0x7f0a2809f9b0></pre>\n</div>\n\n</div>\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n\n\n<div class=\"output_png output_subarea \">\n<img src=\"%0A\">\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Jste-li v příkazové řádce, napřed použij <code>plot()</code> a potom se na graf buď podívete, nebo ho uložte:</p>\n<div class=\"highlight\"><pre><span></span><span class=\"c1\"># Setup</span>\n<span class=\"kn\">import</span> <span class=\"nn\">matplotlib.pyplot</span>\n\n<span class=\"c1\"># Plot</span>\n<span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[</span><span class=\"s1\">'Clothing'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">plot</span><span class=\"p\">()</span>\n<span class=\"n\">matplotlib</span><span class=\"o\">.</span><span class=\"n\">pyplot</span><span class=\"o\">.</span><span class=\"n\">show</span><span class=\"p\">()</span>\n<span class=\"n\">matplotlib</span><span class=\"o\">.</span><span class=\"n\">pyplot</span><span class=\"o\">.</span><span class=\"n\">savefig</span><span class=\"p\">(</span><span class=\"s1\">'graph.png'</span><span class=\"p\">)</span>\n</pre></div><p>Funkce <code>show</code> a <code>savefig</code> pracují s „aktuálním” grafem – typicky posledním, který se vykreslil. Pozor na to, že funkce <code>savefig</code> aktuální graf zahodí; před dalším <code>show</code> nebo <code>savefig</code> je potřeba ho vykreslit znovu.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>V kombinaci s dalšími funkcemi <code>Series</code> a <code>DataFrame</code> umožňují grafy získat o datech rychlý přehled:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [78]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"c1\"># Jak se postupně vyvíjely zisky z oblečení?</span>\n<span class=\"c1\"># `.T` udělá transpozici tabulky (vymění řádky a sloupce)</span>\n<span class=\"c1\"># `cumsum()` spočítá průběžný součet po sloupcích</span>\n<span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">T</span><span class=\"o\">.</span><span class=\"n\">fillna</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">cumsum</span><span class=\"p\">()</span><span class=\"o\">.</span><span class=\"n\">plot</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[78]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre><matplotlib.axes._subplots.AxesSubplot at 0x7f0a23d5dc50></pre>\n</div>\n\n</div>\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n\n\n<div class=\"output_png output_subarea \">\n<img src=\"%0A\">\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [79]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"c1\"># Jak si proti sobě stály jednotlivé kategorie v březnu, dubnu a květnu 2016?</span>\n<span class=\"n\">unstacked</span><span class=\"o\">.</span><span class=\"n\">loc</span><span class=\"p\">[:,</span> <span class=\"s1\">'2016-03'</span><span class=\"p\">:</span><span class=\"s1\">'2016-05'</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">plot</span><span class=\"o\">.</span><span class=\"n\">bar</span><span class=\"p\">(</span><span class=\"n\">legend</span><span class=\"o\">=</span><span class=\"kc\">False</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[79]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre><matplotlib.axes._subplots.AxesSubplot at 0x7f0a21c966a0></pre>\n</div>\n\n</div>\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n\n\n<div class=\"output_png output_subarea \">\n<img src=\"%0A\">\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Další informace jsou, jak už to bývá, <a href=\"http://pandas.pydata.org/pandas-docs/version/0.19.0/visualization.html\">v dokumentaci</a>.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h2>Groupby</h2>\n<p>Často používaná operace pro zjednodušení tabulky je <code>groupby</code>, která sloučí dohromady řádky se stejnou hodnotou v některém sloupci a sloučená data nějak agreguje.</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [80]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">head</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[80]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>month</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Electronics</td>\n <td>2015-01-31</td>\n <td>5890</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Power Tools</td>\n <td>2015-01-31</td>\n <td>3242</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Clothing</td>\n <td>2015-01-31</td>\n <td>6961</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Electronics</td>\n <td>2015-02-28</td>\n <td>3969</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Power Tools</td>\n <td>2015-02-28</td>\n <td>4866</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Samotný výsledek <code>groupby()</code> je jen objekt:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [81]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"s1\">'category'</span><span class=\"p\">)</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[81]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre><pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7f0a21c8cc88></pre>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>... na který musíme zavolat příslušnou agregující funkci. Tady je například součet částek podle kategorie:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [82]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"s1\">'category'</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">sum</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[82]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Clothing</th>\n <td>112701</td>\n </tr>\n <tr>\n <th>Electronics</th>\n <td>103742</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <td>104859</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Nebo počet záznamů:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [83]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"s1\">'category'</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">count</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[83]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>month</th>\n <th>sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Clothing</th>\n <td>22</td>\n <td>22</td>\n </tr>\n <tr>\n <th>Electronics</th>\n <td>22</td>\n <td>22</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <td>23</td>\n <td>23</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Groupby umí agregovat podle více sloupců najednou (i když u našeho příkladu nedává velký smysl):</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [84]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">([</span><span class=\"s1\">'category'</span><span class=\"p\">,</span> <span class=\"s1\">'month'</span><span class=\"p\">])</span><span class=\"o\">.</span><span class=\"n\">sum</span><span class=\"p\">()</span><span class=\"o\">.</span><span class=\"n\">head</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[84]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>month</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th rowspan=\"5\" valign=\"top\">Clothing</th>\n <th>2015-01-31</th>\n <td>6961</td>\n </tr>\n <tr>\n <th>2015-02-28</th>\n <td>2578</td>\n </tr>\n <tr>\n <th>2015-03-31</th>\n <td>9131</td>\n </tr>\n <tr>\n <th>2015-04-30</th>\n <td>618</td>\n </tr>\n <tr>\n <th>2015-05-31</th>\n <td>4796</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Chceme-li aplikovat více funkcí najednou, předáme jejich seznam metodě <code>agg</code>. Časté funkce lze předat jen jménem, jinak předáme funkci či metodu přímo:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [85]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"s1\">'category'</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">agg</span><span class=\"p\">([</span><span class=\"s1\">'mean'</span><span class=\"p\">,</span> <span class=\"s1\">'median'</span><span class=\"p\">,</span> <span class=\"nb\">sum</span><span class=\"p\">,</span> <span class=\"n\">pandas</span><span class=\"o\">.</span><span class=\"n\">Series</span><span class=\"o\">.</span><span class=\"n\">kurtosis</span><span class=\"p\">])</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[85]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead tr th {\n text-align: left\n }\n.lesson-content .dataframe thead tr:last-of-type th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"4\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th></th>\n <th>mean</th>\n <th>median</th>\n <th>sum</th>\n <th>kurt</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Clothing</th>\n <td>5122.772727</td>\n <td>6185.5</td>\n <td>112701</td>\n <td>-1.298035</td>\n </tr>\n <tr>\n <th>Electronics</th>\n <td>4715.545455</td>\n <td>4294.5</td>\n <td>103742</td>\n <td>-1.353210</td>\n </tr>\n <tr>\n <th>Power Tools</th>\n <td>4559.086957</td>\n <td>3769.0</td>\n <td>104859</td>\n <td>-1.044767</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Případně použijeme zkratku pro základní analýzu:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [86]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">g</span> <span class=\"o\">=</span> <span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"s1\">'month'</span><span class=\"p\">)</span>\n<span class=\"n\">g</span><span class=\"o\">.</span><span class=\"n\">describe</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[86]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead tr th {\n text-align: left\n }\n.lesson-content .dataframe thead tr:last-of-type th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"8\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th></th>\n <th>count</th>\n <th>mean</th>\n <th>std</th>\n <th>min</th>\n <th>25%</th>\n <th>50%</th>\n <th>75%</th>\n <th>max</th>\n </tr>\n <tr>\n <th>month</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2015-01-31</th>\n <td>3.0</td>\n <td>5364.333333</td>\n <td>1914.414880</td>\n <td>3242.0</td>\n <td>4566.0</td>\n <td>5890.0</td>\n <td>6425.5</td>\n <td>6961.0</td>\n </tr>\n <tr>\n <th>2015-02-28</th>\n <td>3.0</td>\n <td>3804.333333</td>\n <td>1152.853995</td>\n <td>2578.0</td>\n <td>3273.5</td>\n <td>3969.0</td>\n <td>4417.5</td>\n <td>4866.0</td>\n </tr>\n <tr>\n <th>2015-03-31</th>\n <td>3.0</td>\n <td>3900.333333</td>\n <td>4529.891978</td>\n <td>1281.0</td>\n <td>1285.0</td>\n <td>1289.0</td>\n <td>5210.0</td>\n <td>9131.0</td>\n </tr>\n <tr>\n <th>2015-04-30</th>\n <td>3.0</td>\n <td>3250.000000</td>\n <td>3895.490855</td>\n <td>618.0</td>\n <td>1012.5</td>\n <td>1407.0</td>\n <td>4566.0</td>\n <td>7725.0</td>\n </tr>\n <tr>\n <th>2015-05-31</th>\n <td>3.0</td>\n <td>5792.000000</td>\n <td>2069.341200</td>\n <td>4409.0</td>\n <td>4602.5</td>\n <td>4796.0</td>\n <td>6483.5</td>\n <td>8171.0</td>\n </tr>\n <tr>\n <th>2015-06-30</th>\n <td>3.0</td>\n <td>7241.333333</td>\n <td>2747.220656</td>\n <td>4180.0</td>\n <td>6116.0</td>\n <td>8052.0</td>\n <td>8772.0</td>\n <td>9492.0</td>\n </tr>\n <tr>\n <th>2015-07-31</th>\n <td>3.0</td>\n <td>5836.333333</td>\n <td>2388.415653</td>\n <td>3267.0</td>\n <td>4760.0</td>\n <td>6253.0</td>\n <td>7121.0</td>\n <td>7989.0</td>\n </tr>\n <tr>\n <th>2015-08-31</th>\n <td>1.0</td>\n <td>5534.000000</td>\n <td>NaN</td>\n <td>5534.0</td>\n <td>5534.0</td>\n <td>5534.0</td>\n <td>5534.0</td>\n <td>5534.0</td>\n </tr>\n <tr>\n <th>2015-09-30</th>\n <td>3.0</td>\n <td>3371.000000</td>\n <td>3542.417960</td>\n <td>31.0</td>\n <td>1513.5</td>\n <td>2996.0</td>\n <td>5041.0</td>\n <td>7086.0</td>\n </tr>\n <tr>\n <th>2015-10-31</th>\n <td>3.0</td>\n <td>6367.666667</td>\n <td>3002.029702</td>\n <td>2909.0</td>\n <td>5402.5</td>\n <td>7896.0</td>\n <td>8097.0</td>\n <td>8298.0</td>\n </tr>\n <tr>\n <th>2015-11-30</th>\n <td>3.0</td>\n <td>3917.666667</td>\n <td>3273.148688</td>\n <td>494.0</td>\n <td>2368.5</td>\n <td>4243.0</td>\n <td>5629.5</td>\n <td>7016.0</td>\n </tr>\n <tr>\n <th>2015-12-31</th>\n <td>3.0</td>\n <td>5225.333333</td>\n <td>2377.587082</td>\n <td>3769.0</td>\n <td>3853.5</td>\n <td>3938.0</td>\n <td>5953.5</td>\n <td>7969.0</td>\n </tr>\n <tr>\n <th>2016-01-31</th>\n <td>3.0</td>\n <td>8453.666667</td>\n <td>536.431108</td>\n <td>7852.0</td>\n <td>8239.5</td>\n <td>8627.0</td>\n <td>8754.5</td>\n <td>8882.0</td>\n </tr>\n <tr>\n <th>2016-02-29</th>\n <td>3.0</td>\n <td>6417.666667</td>\n <td>2290.170372</td>\n <td>4194.0</td>\n <td>5242.0</td>\n <td>6290.0</td>\n <td>7529.5</td>\n <td>8769.0</td>\n </tr>\n <tr>\n <th>2016-03-31</th>\n <td>3.0</td>\n <td>2345.666667</td>\n <td>537.738164</td>\n <td>2012.0</td>\n <td>2035.5</td>\n <td>2059.0</td>\n <td>2512.5</td>\n <td>2966.0</td>\n </tr>\n <tr>\n <th>2016-04-30</th>\n <td>3.0</td>\n <td>5439.000000</td>\n <td>4444.797408</td>\n <td>471.0</td>\n <td>3639.0</td>\n <td>6807.0</td>\n <td>7923.0</td>\n <td>9039.0</td>\n </tr>\n <tr>\n <th>2016-05-31</th>\n <td>3.0</td>\n <td>2391.333333</td>\n <td>2675.235566</td>\n <td>314.0</td>\n <td>882.0</td>\n <td>1450.0</td>\n <td>3430.0</td>\n <td>5410.0</td>\n </tr>\n <tr>\n <th>2016-06-30</th>\n <td>3.0</td>\n <td>5012.000000</td>\n <td>3178.877632</td>\n <td>2858.0</td>\n <td>3186.5</td>\n <td>3515.0</td>\n <td>6089.0</td>\n <td>8663.0</td>\n </tr>\n <tr>\n <th>2016-07-31</th>\n <td>3.0</td>\n <td>8232.000000</td>\n <td>1732.765131</td>\n <td>6382.0</td>\n <td>7439.5</td>\n <td>8497.0</td>\n <td>9157.0</td>\n <td>9817.0</td>\n </tr>\n <tr>\n <th>2016-08-31</th>\n <td>3.0</td>\n <td>5452.333333</td>\n <td>4539.188621</td>\n <td>349.0</td>\n <td>3659.0</td>\n <td>6969.0</td>\n <td>8004.0</td>\n <td>9039.0</td>\n </tr>\n <tr>\n <th>2016-09-30</th>\n <td>3.0</td>\n <td>3569.333333</td>\n <td>5183.962802</td>\n <td>-735.0</td>\n <td>692.0</td>\n <td>2119.0</td>\n <td>5721.5</td>\n <td>9324.0</td>\n </tr>\n <tr>\n <th>2016-10-31</th>\n <td>3.0</td>\n <td>3487.333333</td>\n <td>2247.644174</td>\n <td>919.0</td>\n <td>2683.5</td>\n <td>4448.0</td>\n <td>4771.5</td>\n <td>5095.0</td>\n </tr>\n <tr>\n <th>2016-11-30</th>\n <td>3.0</td>\n <td>385.333333</td>\n <td>887.008643</td>\n <td>-259.0</td>\n <td>-120.5</td>\n <td>18.0</td>\n <td>707.5</td>\n <td>1397.0</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing text_cell rendered\"><div class=\"prompt input_prompt\">\n</div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>A perlička nakonec – agregovat se dá i podle sloupců, které nejsou v tabulce. Následující kód rozloží data na slabé, průměrné a silné měsíce podle toho, kolik jsme v daném měsíci vydělali celých tisícikorun, a zjistí celkový zisk ze slabých, průměrných a silných měsíců:</p>\n</div>\n</div>\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [87]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">bin_size</span> <span class=\"o\">=</span> <span class=\"mi\">10000</span>\n<span class=\"n\">by_month</span> <span class=\"o\">=</span> <span class=\"n\">data</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"s1\">'month'</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">sum</span><span class=\"p\">()</span>\n<span class=\"n\">by_thousands</span> <span class=\"o\">=</span> <span class=\"n\">by_month</span><span class=\"o\">.</span><span class=\"n\">groupby</span><span class=\"p\">(</span><span class=\"n\">by_month</span><span class=\"p\">[</span><span class=\"s1\">'sales'</span><span class=\"p\">]</span> <span class=\"o\">//</span> <span class=\"n\">bin_size</span> <span class=\"o\">*</span> <span class=\"n\">bin_size</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">agg</span><span class=\"p\">([</span><span class=\"s1\">'count'</span><span class=\"p\">,</span> <span class=\"s1\">'sum'</span><span class=\"p\">])</span>\n<span class=\"n\">by_thousands</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[87]:</div>\n\n\n\n<div class=\"output_html rendered_html output_subarea output_execute_result\">\n<div>\n<style>.lesson-content .dataframe tbody tr th:only-of-type {\n vertical-align: middle\n }\n.lesson-content .dataframe tbody tr th {\n vertical-align: top\n }\n.lesson-content .dataframe thead tr th {\n text-align: left\n }\n.lesson-content .dataframe thead tr:last-of-type th {\n text-align: right\n }</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th></th>\n <th>count</th>\n <th>sum</th>\n </tr>\n <tr>\n <th>sales</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>5</td>\n <td>30651</td>\n </tr>\n <tr>\n <th>10000</th>\n <td>15</td>\n <td>218870</td>\n </tr>\n <tr>\n <th>20000</th>\n <td>3</td>\n <td>71781</td>\n </tr>\n </tbody>\n</table>\n</div>\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In [88]:</div>\n<div class=\"inner_cell\">\n <div class=\"input_area\">\n<div class=\" highlight hl-ipython3\"><pre><span></span><span class=\"n\">by_thousands</span><span class=\"p\">[(</span><span class=\"s1\">'sales'</span><span class=\"p\">,</span> <span class=\"s1\">'sum'</span><span class=\"p\">)]</span><span class=\"o\">.</span><span class=\"n\">plot</span><span class=\"p\">()</span>\n</pre></div>\n\n</div>\n</div>\n</div>\n\n<div class=\"output_wrapper\">\n<div class=\"output\">\n\n\n<div class=\"output_area\">\n\n<div class=\"prompt output_prompt\">Out[88]:</div>\n\n\n\n\n<div class=\"output_text output_subarea output_execute_result\">\n<pre><matplotlib.axes._subplots.AxesSubplot at 0x7f0a2a54e3c8></pre>\n</div>\n\n</div>\n\n<div class=\"output_area\">\n\n<div class=\"prompt\"></div>\n\n\n\n\n<div class=\"output_png output_subarea \">\n<img src=\"%0A\">\n</div>\n\n</div>\n\n</div>\n</div>\n\n</div>\n \n\n\n\n\n " } } }