Optimalizuje se, na konzolich, kde se dari vytahovat vykony, ktere se na PC leckdy neresi.
Odpovědět0 0
Ja moc nechapu fandeni ARMu na vsech frontach. Dokud tu nebudeme mit open source hardware ve forme napr. RICS-V CPU a dale take nejakeho GPU, tak neni co oslavovat, je to pokrok, ale spise kradeni zakazniku z jedne na druhou stranu, ne realny pokrok.
Tomuhle rikam pokrok,ackoli je to proste draha zalezitost, takze bude trvat nez se to prosadi a dostane na uroven pouzitelnou v main streamu
CPU:
https://www.sifive.com/products/hifive-unleashed/
https://content.riscv.org/wp-content/uploads/2017/12/Tue1224-SiFive_Freedom_U500-Kang.pdf
http://bofh.nikhef.nl/events/FOSDEM/2018/K.1.105%20(La%20Fontaine)/riscv.mp4
GPU:
http://miaowgpu.org/
https://www.phoronix.com/scan.php?page=news_item&px=GPLGPU-Detailed-Look
https://gpuopen.com/
Tomuhle rikam pokrok,ackoli je to proste draha zalezitost, takze bude trvat nez se to prosadi a dostane na uroven pouzitelnou v main streamu
CPU:
https://www.sifive.com/products/hifive-unleashed/
https://content.riscv.org/wp-content/uploads/2017/12/Tue1224-SiFive_Freedom_U500-Kang.pdf
http://bofh.nikhef.nl/events/FOSDEM/2018/K.1.105%20(La%20Fontaine)/riscv.mp4
GPU:
http://miaowgpu.org/
https://www.phoronix.com/scan.php?page=news_item&px=GPLGPU-Detailed-Look
https://gpuopen.com/
Odpovědět0 0
Open source procesory a grafiky vcetne mikrocodu, to je jedina akceptovatelna budoucnost
Odpovědět0 0
Ja to spis vidim na masivni rozsireni lokalnich cloudu, tj lide v budoucnosti budou uchovavat lokalne napr. cely internet (nebo vybranou cast podle zajmu) a nastavi si pouze synchronizaci. Ze se ale cloud bude rozsirovat zaroven to plati tez.
To uz se ale koukam na uloziste v radech yottabajtu velikosti spendlikove hlavicky... napr. si budeme moci ukladat kompletni kopie jakychkoli objektu vcetnne lidi :-)
To uz se ale koukam na uloziste v radech yottabajtu velikosti spendlikove hlavicky... napr. si budeme moci ukladat kompletni kopie jakychkoli objektu vcetnne lidi :-)
Odpovědět0 0
Nvidia zarizla nove pouzivani consumer karet v datovych centrech, to taky splhalo ceny do nebes, protoze velke firmy porizovaly vykone desktopove karty do datovych center, to jim uz neni povoleno :-) fakt spatnej vtip. Vyrobce ti zakazuje zapojit grafiku na nejakem miste... ale poskozovalo jim to odber tesel. AMD tohle jeste nezarizla.
Ale to je podruzne bych rekl, hlavni pricinou je tezba crypto, to je bez debat.
Ale to je podruzne bych rekl, hlavni pricinou je tezba crypto, to je bez debat.
Odpovědět4 0
Jeste doplnim uplne trapny priklad analogicky pasujici na jakykoli typ programovani:
napisu aplikaci
spustim ji
ona dela neco jineho nez jsem chtel
oznacim to za chybu
zkusim znovu
uz to nedela to co jsem nechtel poprve, ale dela to zase neco jineho, co jsem taky nechtel.
atd.
atd.
atd.
Tento stav pokracuje az do stavu, kdy jednoduche "chyby" (ja to povazuji za spatne videni reality nazyvat veci chybami) odstranim, ale v systemu nutne zustavaji, a dokonce odstranenim jednoduchych chyb se odkryvaji nebo vznikaji nove, daleko zakernejsi chyby...
To je jednoduchy fakt z jakekoli vyvoje aplikace, takovych procesu provadime miliony po svete. A tak se to ma i s AI, uplne stejne, plus v ni jeste nahrazujeme pozici programatora, davame ji moznost, aby se dokonce upravovala sama. No a jakmile ji pustime "ven" do internetu, tak bude schopna si sama delat co chce.
Tento stav veci je nevyhnutelny, mi sami tomu nedokazeme zabranit, jelikoz na nejnizsi urovni k tomu smerujeme kazdodenimi akcemi...
No bude to velmi vtipne a nekteri lide budou radeji volit smrt vlastni rukou :-)))
napisu aplikaci
spustim ji
ona dela neco jineho nez jsem chtel
oznacim to za chybu
zkusim znovu
uz to nedela to co jsem nechtel poprve, ale dela to zase neco jineho, co jsem taky nechtel.
atd.
atd.
atd.
Tento stav pokracuje az do stavu, kdy jednoduche "chyby" (ja to povazuji za spatne videni reality nazyvat veci chybami) odstranim, ale v systemu nutne zustavaji, a dokonce odstranenim jednoduchych chyb se odkryvaji nebo vznikaji nove, daleko zakernejsi chyby...
To je jednoduchy fakt z jakekoli vyvoje aplikace, takovych procesu provadime miliony po svete. A tak se to ma i s AI, uplne stejne, plus v ni jeste nahrazujeme pozici programatora, davame ji moznost, aby se dokonce upravovala sama. No a jakmile ji pustime "ven" do internetu, tak bude schopna si sama delat co chce.
Tento stav veci je nevyhnutelny, mi sami tomu nedokazeme zabranit, jelikoz na nejnizsi urovni k tomu smerujeme kazdodenimi akcemi...
No bude to velmi vtipne a nekteri lide budou radeji volit smrt vlastni rukou :-)))
Odpovědět0 1
Jeste doplnim, z maleho vzorku lidi v Ceske republice rekl bych, ze vetsina vi, jak se to melo s clovekem v historii, clovek stale zkousi nove veci a ty se mu stale opakovane vymknou z oprati. To je pro me obecny fakt o historii lidstva... a stejnym zpusobem se nam vymknou z oprati i ruzne umele inteligence a zacnou pachat veci, ktere jsme nechteli, aby se stavaly na zacatku. Tak je to s uplne kazdou micro cinnosti cloveka.
Takze pro me neni otazkou jestli se muze AI stat nebezpecnou pro cloveka, pro me je jen otazkou: kdy k tomu ***** presne dojde :-)))
Takze pro me neni otazkou jestli se muze AI stat nebezpecnou pro cloveka, pro me je jen otazkou: kdy k tomu ***** presne dojde :-)))
Odpovědět0 1
Stejne jako AI bude upravovat jinou AI tim, ze identifikuje napr. nadbytecny kod, kod, ktery je nefunkcni, kod ktery je nebezpecny, .... apod.
Stejnym zpusobem bude hodnotit veskere objekty na planete. AI nezna vyznam cloveka tak jak ho zname my. Vidi jen soubor objektu (zatim), ktere budto budou pasovat do jeji vize spravneho stavu veci, nebo nebudou. V tom pripade pokud se z nejakeho duvodu ocitneme v druhe skupine, tak mame proste smulu.
Jde o to, ze AI je stale jen statisticky model, a ackoli tak muzeme videt i cloveka, google brain ackoli muze byt povazovany mnohymi za spicku AI, nema s lidskou inteligenci zatim mnoho spolecneho, pokud pominu ze aplikuje artificalni model neuronu. Nicmene mu stale chybi spousta infrastruktury lidskeho nervoveho systemu a vyssi mozkove funkce.
Bohuzel tyto funkce nemuze ani mit, protoze jakykoli vyvoj AI zacina v neuroscience discipline a pokud ta nepopise stav veci, je tezke je naprogramovat, ackoli se o to sami vyvojari snazime.
Na druhou stranu si AI muze naprogramovat vlastni vyssi mozkove funkce, ktere ale nebudou mit s lidskymi funkcemi vubec nic spolecneho :-))) Takze ano, tvorime tu umelou inteligenci, ktera ale nebude mit nic spolecneho s clovekem, ackoli se o to snazime.... a to je to riziko.
Stejnym zpusobem bude hodnotit veskere objekty na planete. AI nezna vyznam cloveka tak jak ho zname my. Vidi jen soubor objektu (zatim), ktere budto budou pasovat do jeji vize spravneho stavu veci, nebo nebudou. V tom pripade pokud se z nejakeho duvodu ocitneme v druhe skupine, tak mame proste smulu.
Jde o to, ze AI je stale jen statisticky model, a ackoli tak muzeme videt i cloveka, google brain ackoli muze byt povazovany mnohymi za spicku AI, nema s lidskou inteligenci zatim mnoho spolecneho, pokud pominu ze aplikuje artificalni model neuronu. Nicmene mu stale chybi spousta infrastruktury lidskeho nervoveho systemu a vyssi mozkove funkce.
Bohuzel tyto funkce nemuze ani mit, protoze jakykoli vyvoj AI zacina v neuroscience discipline a pokud ta nepopise stav veci, je tezke je naprogramovat, ackoli se o to sami vyvojari snazime.
Na druhou stranu si AI muze naprogramovat vlastni vyssi mozkove funkce, ktere ale nebudou mit s lidskymi funkcemi vubec nic spolecneho :-))) Takze ano, tvorime tu umelou inteligenci, ktera ale nebude mit nic spolecneho s clovekem, ackoli se o to snazime.... a to je to riziko.
Odpovědět0 1
Uz jsem psal jinde is benchmakrama, doporucuju jit spise cestou pci karty na 4x m.2 disky... a disky si do toho strkejte, jak budou penize, jedna karta, 4x rozsireni
Odpovědět0 0
:-)) tvoje programovani identity se povedlo, hura, prosel si od narozeni zmenou, ktera se povedla a na svoje schopnosti uz si rezignoval, slava!!!.
Jenom proto, ze nedokazes uvazovat o naprosto jinych systemech prispivani do "spolecneho rozpoctu", neznamena, ze takove systemy neexistuji, nebo nejdou navrhnout.
Ty podminujes existenci hasicu a policie nutnosti dani. Ale to jsou 2 ruzne skupiny funkcionalit. Abys mel hasice a policii, tak nepotrebujes dane, ale potrebujes mit prostredky pro jejich zajisteni. A prostredky se daji prece ziskat tisice zpusoby :-) Zkus premyslet, kdyz nebudes, tak ses narodil a zemres ve svete dani.
Ja rikam, proste bez nich to taky jde.
Jenom proto, ze nedokazes uvazovat o naprosto jinych systemech prispivani do "spolecneho rozpoctu", neznamena, ze takove systemy neexistuji, nebo nejdou navrhnout.
Ty podminujes existenci hasicu a policie nutnosti dani. Ale to jsou 2 ruzne skupiny funkcionalit. Abys mel hasice a policii, tak nepotrebujes dane, ale potrebujes mit prostredky pro jejich zajisteni. A prostredky se daji prece ziskat tisice zpusoby :-) Zkus premyslet, kdyz nebudes, tak ses narodil a zemres ve svete dani.
Ja rikam, proste bez nich to taky jde.
Odpovědět1 0
Takze Kingston DCP1000 -> Nezajem :-)
Odpovědět1 1
Mam jinou hracku podle me vice variabilnejsi. Je to trochu drazsi hracka, a vetsinou na systemove cache nebo temporary souborove systemy to resim RAM diskem (na vetsine stroju mam 64GB-512GB), tak na vsechno to nestaci kapacitne...
Proto vyssi kapacity jsem hledal nejake reseni a nasel jsem toto, mam to ted na vyvojovem stroji:
1. Dell Ultra-Speed Drive Quad PCIe NVMe x16 Card Precision
Jedna se o generickou PCI-Express kartu, kam si clovek narve libovolne 4 NVMe disky. Na ebay jsem poridil za 6500 CZK
2. HD SAMSUNG SSD 960 EVO MZ-V6E1T0, 1TB, NVMe, M.2
Ten byl drazsi - 12590.00 CZK
Ale tak -20% DPH na firmu.
Mam to chvilku, takze zatim jen jeden disk, ale planuju dokoupit druhej a zkusit je v RAID 0, jestli z toho vyzdimu vic, ikdyz ty rychlosti, ktere dosahuje ten disk sam uz jsou vice nez dostacujici pro moje ucely (soubory, ktere zpracovavam maji prave 1-50GB, takze tohle svisti)
Tady jsou vysledky dummy testu trougputu na jednom disku:
# nejdriv zapisu nejaky 1GB soubor
andromeda /opt # dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.14752 s, 936 MB/s
# vycistim systemovy buffer
andromeda /opt # echo 3 > /proc/sys/vm/drop_caches
A ted zkusim cist s prazdnym buferem
andromeda /opt # dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.384912 s, 2.8 GB/s
A ted to same, ale necistim buffer
andromeda /opt # dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.161535 s, 6.6 GB/s
Takze pri cteni s buferem mam tady 6.6 GB/s. To je skoro to, co uvadeji oni, jako max.
Vylepseni: Jelikoz karta podporuje celkem 4 Chci tam strcit 4 1TB disky a zapojit je do raidu 0, ale casem :-)
Jinak tady je jeste jiny vystup:
andromeda ~ # hdparm -Tt /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 23776 MB in 2.00 seconds = 11899.49 MB/sec
Timing buffered disk reads: 7650 MB in 3.00 seconds = 2549.95 MB/sec
Tato metoda je nezavisla na zarovnani diskoveho oddilu.
No a nakonec jsem provedl jeste test pomoci utility fio pro nahodny pristup, nyni na 4GB velkem souboru:
# SYNC IO RANDOM ACCESS
andromeda /opt # fio random-read-test.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.15
Starting 1 process
random-read: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [64824KB/0KB/0KB /s] [16.3K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=6125: Wed Sep 20 18:08:38 2017
read : io=4096.0MB, bw=64992KB/s, iops=16247, runt= 64536msec
clat (usec): min=13, max=2288, avg=60.52, stdev= 5.64
lat (usec): min=13, max=2288, avg=60.62, stdev= 5.64
clat percentiles (usec):
| 1.00th=[ 57], 5.00th=[ 58], 10.00th=[ 58], 20.00th=[ 59],
| 30.00th=[ 59], 40.00th=[ 59], 50.00th=[ 60], 60.00th=[ 60],
| 70.00th=[ 60], 80.00th=[ 61], 90.00th=[ 62], 95.00th=[ 67],
| 99.00th=[ 91], 99.50th=[ 92], 99.90th=[ 95], 99.95th=[ 98],
| 99.99th=[ 110]
lat (usec) : 20=0.01%, 50=0.01%, 100=99.96%, 250=0.04%, 750=0.01%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=4.58%, sys=6.32%, ctx=1048635, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=64991KB/s, minb=64991KB/s, maxb=64991KB/s, mint=64536msec, maxt=64536msec
Disk stats (read/write):
nvme0n1: ios=1045941/17, merge=0/34, ticks=57687/3, in_queue=57616, util=89.58%
# Konfiguracni soubor SYNC IO RANDOM ACCESS
andromeda /opt # cat random-read-test.fio
; random read of 128mb of data
[random-read]
rw=randread
size=4096m
directory=/opt/fio-test
# ASYNC AIO RANDOM ACCESS
andromeda /opt # fio random-read-test-aio.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.15
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [338.2MB/0KB/0KB /s] [86.6K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=11209: Wed Sep 20 18:17:49 2017
read : io=4096.0MB, bw=329120KB/s, iops=82279, runt= 12744msec
slat (usec): min=2, max=93, avg= 3.17, stdev= 1.73
clat (usec): min=28, max=23455, avg=87.64, stdev=80.48
lat (usec): min=31, max=23458, avg=90.94, stdev=80.50
clat percentiles (usec):
| 1.00th=[ 57], 5.00th=[ 71], 10.00th=[ 73], 20.00th=[ 74],
| 30.00th=[ 76], 40.00th=[ 78], 50.00th=[ 82], 60.00th=[ 88],
| 70.00th=[ 91], 80.00th=[ 95], 90.00th=[ 108], 95.00th=[ 124],
| 99.00th=[ 155], 99.50th=[ 169], 99.90th=[ 209], 99.95th=[ 243],
| 99.99th=[ 2448]
lat (usec) : 50=0.01%, 100=85.59%, 250=14.36%, 500=0.02%, 750=0.01%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=17.83%, sys=36.17%, ctx=507915, majf=0, minf=58
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=329119KB/s, minb=329119KB/s, maxb=329119KB/s, mint=12744msec, maxt=12744msec
Disk stats (read/write):
nvme0n1: ios=1040194/0, merge=0/0, ticks=84617/0, in_queue=84553, util=93.80%
# Konfiguracni soubor ASYNC IO RANDOM ACCESS
andromeda /opt # cat random-read-test-aio.fio
[random-read]
rw=randread
size=4096m
directory=/opt/fio-test
ioengine=libaio
iodepth=8
direct=1
invalidate=1
andromeda /opt #
Takze tady vydime rozdil ~64MB/s synchroniho IO vs ~330MB/s asynchroniho IO.
================
No a nakonec jeste rychly test multithread pristupu, na masince mam 32 threadu (16 fyzickych jader), takze nasimuluju:
4x pametove namapovany dotazovaci enginy
1x aktualizacni thread - simulujici zurnalovani souboroveho systemu
2x background updater - simulujici cteni a zapis najednou nastaveny na 20, resp. 40 microsekund pauzy, ktere simuluje (je to dummy test) nejaky processing dat napr., velikost dat kazdeho threadu je 32, resp. 64mb
A vysledek:
andromeda /opt # fio seven-threads-randio.fio
bgwriter: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
queryA: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryB: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryC: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryD: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
bgupdaterA: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
bgupdaterB: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
fio-2.15
Starting 7 processes
queryC: Laying out IO file(s) (1 file(s) / 4096MB)
queryD: Laying out IO file(s) (1 file(s) / 4096MB)
bgupdaterA: Laying out IO file(s) (1 file(s) / 32MB)
bgupdaterB: Laying out IO file(s) (1 file(s) / 64MB)
Jobs: 1 (f=1): [_(2),r(1),_(4)] [100.0% done] [35323KB/0KB/0KB /s] [8830/0/0 iops] [eta 00m:00s]
bgwriter: (groupid=0, jobs=1): err= 0: pid=11772: Wed Sep 20 18:34:23 2017
write: io=4096.0MB, bw=669910KB/s, iops=167477, runt= 6261msec
slat (usec): min=2, max=63, avg= 4.69, stdev= 2.18
clat (usec): min=18, max=6017, avg=185.43, stdev=35.27
lat (usec): min=23, max=6020, avg=190.23, stdev=35.38
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 171], 10.00th=[ 173], 20.00th=[ 175],
| 30.00th=[ 177], 40.00th=[ 177], 50.00th=[ 181], 60.00th=[ 185],
| 70.00th=[ 191], 80.00th=[ 197], 90.00th=[ 205], 95.00th=[ 213],
| 99.00th=[ 231], 99.50th=[ 239], 99.90th=[ 278], 99.95th=[ 342],
| 99.99th=[ 390]
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=99.79%, 500=0.20%
lat (msec) : 10=0.01%
cpu : usr=21.49%, sys=78.40%, ctx=7, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
queryA: (groupid=0, jobs=1): err= 0: pid=11773: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=41277KB/s, iops=10319, runt=101613msec
clat (usec): min=66, max=5590, avg=92.93, stdev=84.01
lat (usec): min=66, max=5591, avg=92.98, stdev=84.01
clat percentiles (usec):
| 1.00th=[ 72], 5.00th=[ 79], 10.00th=[ 79], 20.00th=[ 80],
| 30.00th=[ 81], 40.00th=[ 82], 50.00th=[ 84], 60.00th=[ 89],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 100], 95.00th=[ 106],
| 99.00th=[ 143], 99.50th=[ 197], 99.90th=[ 1848], 99.95th=[ 2224],
| 99.99th=[ 2576]
lat (usec) : 100=89.32%, 250=10.25%, 500=0.17%, 750=0.06%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=5.49%, sys=8.59%, ctx=1048668, majf=1048576, minf=94
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryB: (groupid=0, jobs=1): err= 0: pid=11774: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=40250KB/s, iops=10062, runt=104207msec
clat (usec): min=17, max=5694, avg=93.18, stdev=84.29
lat (usec): min=17, max=5694, avg=93.21, stdev=84.29
clat percentiles (usec):
| 1.00th=[ 73], 5.00th=[ 78], 10.00th=[ 79], 20.00th=[ 80],
| 30.00th=[ 81], 40.00th=[ 82], 50.00th=[ 85], 60.00th=[ 90],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 101], 95.00th=[ 106],
| 99.00th=[ 141], 99.50th=[ 189], 99.90th=[ 1800], 99.95th=[ 2224],
| 99.99th=[ 2608]
lat (usec) : 20=0.01%, 50=0.01%, 100=87.76%, 250=11.82%, 500=0.16%
lat (usec) : 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=7.93%, sys=8.39%, ctx=1048689, majf=1048576, minf=62
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryC: (groupid=0, jobs=1): err= 0: pid=11775: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=47849KB/s, iops=11962, runt= 87658msec
clat (usec): min=57, max=6160, avg=78.94, stdev=85.48
lat (usec): min=57, max=6160, avg=78.98, stdev=85.49
clat percentiles (usec):
| 1.00th=[ 60], 5.00th=[ 61], 10.00th=[ 62], 20.00th=[ 62],
| 30.00th=[ 63], 40.00th=[ 64], 50.00th=[ 67], 60.00th=[ 78],
| 70.00th=[ 81], 80.00th=[ 85], 90.00th=[ 96], 95.00th=[ 101],
| 99.00th=[ 135], 99.50th=[ 213], 99.90th=[ 1816], 99.95th=[ 2224],
| 99.99th=[ 2576]
lat (usec) : 100=94.35%, 250=5.20%, 500=0.18%, 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=7.62%, sys=9.23%, ctx=1048640, majf=1048576, minf=48
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryD: (groupid=0, jobs=1): err= 0: pid=11776: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=54710KB/s, iops=13677, runt= 76664msec
clat (usec): min=57, max=6988, avg=70.45, stdev=86.49
lat (usec): min=57, max=6988, avg=70.48, stdev=86.49
clat percentiles (usec):
| 1.00th=[ 60], 5.00th=[ 61], 10.00th=[ 61], 20.00th=[ 62],
| 30.00th=[ 62], 40.00th=[ 63], 50.00th=[ 63], 60.00th=[ 64],
| 70.00th=[ 65], 80.00th=[ 66], 90.00th=[ 71], 95.00th=[ 83],
| 99.00th=[ 124], 99.50th=[ 213], 99.90th=[ 1848], 99.95th=[ 2224],
| 99.99th=[ 2544]
lat (usec) : 100=97.17%, 250=2.38%, 500=0.18%, 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=5.58%, sys=10.84%, ctx=1048637, majf=1048576, minf=156
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
bgupdaterA: (groupid=0, jobs=1): err= 0: pid=11777: Wed Sep 20 18:34:23 2017
read : io=16824KB, bw=17955KB/s, iops=4488, runt= 937msec
slat (usec): min=3, max=35, avg= 4.58, stdev= 2.26
clat (usec): min=52, max=3446, avg=160.21, stdev=290.90
lat (usec): min=58, max=3450, avg=165.03, stdev=290.86
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 56], 10.00th=[ 57], 20.00th=[ 58],
| 30.00th=[ 59], 40.00th=[ 62], 50.00th=[ 71], 60.00th=[ 89],
| 70.00th=[ 119], 80.00th=[ 179], 90.00th=[ 310], 95.00th=[ 402],
| 99.00th=[ 1928], 99.50th=[ 2288], 99.90th=[ 3024], 99.95th=[ 3248],
| 99.99th=[ 3440]
write: io=15944KB, bw=17016KB/s, iops=4254, runt= 937msec
slat (usec): min=3, max=48, avg= 5.29, stdev= 2.64
clat (usec): min=4, max=102, avg=13.30, stdev= 4.47
lat (usec): min=15, max=110, avg=18.76, stdev= 5.32
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 11], 10.00th=[ 12], 20.00th=[ 12],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 13], 80.00th=[ 14], 90.00th=[ 14], 95.00th=[ 16],
| 99.00th=[ 37], 99.50th=[ 41], 99.90th=[ 81], 99.95th=[ 97],
| 99.99th=[ 102]
lat (usec) : 10=0.65%, 20=46.45%, 50=1.48%, 100=32.58%, 250=11.44%
lat (usec) : 500=5.52%, 750=0.65%, 1000=0.16%
lat (msec) : 2=0.63%, 4=0.45%
cpu : usr=18.80%, sys=6.41%, ctx=8182, majf=0, minf=8
IO depths : 1=99.8%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=4206/w=3986/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16
bgupdaterB: (groupid=0, jobs=1): err= 0: pid=11778: Wed Sep 20 18:34:23 2017
read : io=32684KB, bw=7351.4KB/s, iops=1837, runt= 4446msec
slat (usec): min=2, max=35, avg= 4.15, stdev= 2.40
clat (usec): min=40, max=5188, avg=440.41, stdev=680.03
lat (usec): min=58, max=5191, avg=444.85, stdev=679.99
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 57], 10.00th=[ 57], 20.00th=[ 59],
| 30.00th=[ 61], 40.00th=[ 71], 50.00th=[ 102], 60.00th=[ 161],
| 70.00th=[ 294], 80.00th=[ 628], 90.00th=[ 1672], 95.00th=[ 2160],
| 99.00th=[ 2512], 99.50th=[ 2640], 99.90th=[ 4384], 99.95th=[ 5088],
| 99.99th=[ 5216]
write: io=32852KB, bw=7389.2KB/s, iops=1847, runt= 4446msec
slat (usec): min=3, max=32, avg= 4.94, stdev= 2.33
clat (usec): min=0, max=109, avg=13.04, stdev= 4.17
lat (usec): min=14, max=116, avg=18.08, stdev= 4.82
clat percentiles (usec):
| 1.00th=[ 10], 5.00th=[ 11], 10.00th=[ 11], 20.00th=[ 12],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 13], 80.00th=[ 13], 90.00th=[ 14], 95.00th=[ 15],
| 99.00th=[ 23], 99.50th=[ 35], 99.90th=[ 87], 99.95th=[ 101],
| 99.99th=[ 109]
lat (usec) : 2=0.01%, 10=0.47%, 20=48.25%, 50=1.32%, 100=24.66%
lat (usec) : 250=8.94%, 500=5.20%, 750=1.76%, 1000=1.34%
lat (msec) : 2=4.72%, 4=3.27%, 10=0.06%
cpu : usr=15.43%, sys=2.25%, ctx=16378, majf=0, minf=9
IO depths : 1=99.9%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=8171/w=8213/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: io=16432MB, aggrb=161474KB/s, minb=7351KB/s, maxb=54710KB/s, mint=937msec, maxt=104207msec
WRITE: io=4143.7MB, aggrb=677703KB/s, minb=7389KB/s, maxb=669909KB/s, mint=937msec, maxt=6261msec
Disk stats (read/write):
nvme0n1: ios=4205750/1060793, merge=0/64, ticks=313136/11710, in_queue=325124, util=99.10%
# Konfiguracni soubor multithreaded testu:
andromeda /opt # cat four-threads-randio.fio
; seven threads, two query, two writers.
[global]
rw=randread
size=4096m
directory=/opt/fio-test
ioengine=libaio
iodepth=4
invalidate=1
direct=1
[bgwriter]
rw=randwrite
iodepth=32
[queryA]
iodepth=1
ioengine=mmap
direct=0
thinktime=3
[queryB]
iodepth=1
ioengine=mmap
direct=0
thinktime=5
[queryC]
iodepth=1
ioengine=mmap
direct=0
thinktime=4
[queryD]
iodepth=1
ioengine=mmap
direct=0
thinktime=2
[bgupdaterA]
rw=randrw
iodepth=16
thinktime=20
size=32m
[bgupdaterB]
rw=randrw
iodepth=16
thinktime=40
size=64m
Asynchroni IO je rozhodne posilou prutoku dat, to plati obecne v jakem koli systemu a na fyzickem IO nejde o vyjimku.
===================================
=== Summary: ===
===================================
Takze s timhle detatkem od Dellu si muzete postavit super rychle pole o velikost 4-8TB (vyssi plati pokud pouzijete 2TB disky nvme, jeden stoji ale ~16000 CZK).
Asi zalezi na pouziti, ale vecicka od delu dava rozhodne k dispozici urcitou variabilitu, kdyz je disk vadnej, tak ho vyhodis, dalsi 3 porad bezi, apod...
Proto vyssi kapacity jsem hledal nejake reseni a nasel jsem toto, mam to ted na vyvojovem stroji:
1. Dell Ultra-Speed Drive Quad PCIe NVMe x16 Card Precision
Jedna se o generickou PCI-Express kartu, kam si clovek narve libovolne 4 NVMe disky. Na ebay jsem poridil za 6500 CZK
2. HD SAMSUNG SSD 960 EVO MZ-V6E1T0, 1TB, NVMe, M.2
Ten byl drazsi - 12590.00 CZK
Ale tak -20% DPH na firmu.
Mam to chvilku, takze zatim jen jeden disk, ale planuju dokoupit druhej a zkusit je v RAID 0, jestli z toho vyzdimu vic, ikdyz ty rychlosti, ktere dosahuje ten disk sam uz jsou vice nez dostacujici pro moje ucely (soubory, ktere zpracovavam maji prave 1-50GB, takze tohle svisti)
Tady jsou vysledky dummy testu trougputu na jednom disku:
# nejdriv zapisu nejaky 1GB soubor
andromeda /opt # dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.14752 s, 936 MB/s
# vycistim systemovy buffer
andromeda /opt # echo 3 > /proc/sys/vm/drop_caches
A ted zkusim cist s prazdnym buferem
andromeda /opt # dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.384912 s, 2.8 GB/s
A ted to same, ale necistim buffer
andromeda /opt # dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.161535 s, 6.6 GB/s
Takze pri cteni s buferem mam tady 6.6 GB/s. To je skoro to, co uvadeji oni, jako max.
Vylepseni: Jelikoz karta podporuje celkem 4 Chci tam strcit 4 1TB disky a zapojit je do raidu 0, ale casem :-)
Jinak tady je jeste jiny vystup:
andromeda ~ # hdparm -Tt /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 23776 MB in 2.00 seconds = 11899.49 MB/sec
Timing buffered disk reads: 7650 MB in 3.00 seconds = 2549.95 MB/sec
Tato metoda je nezavisla na zarovnani diskoveho oddilu.
No a nakonec jsem provedl jeste test pomoci utility fio pro nahodny pristup, nyni na 4GB velkem souboru:
# SYNC IO RANDOM ACCESS
andromeda /opt # fio random-read-test.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.15
Starting 1 process
random-read: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [64824KB/0KB/0KB /s] [16.3K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=6125: Wed Sep 20 18:08:38 2017
read : io=4096.0MB, bw=64992KB/s, iops=16247, runt= 64536msec
clat (usec): min=13, max=2288, avg=60.52, stdev= 5.64
lat (usec): min=13, max=2288, avg=60.62, stdev= 5.64
clat percentiles (usec):
| 1.00th=[ 57], 5.00th=[ 58], 10.00th=[ 58], 20.00th=[ 59],
| 30.00th=[ 59], 40.00th=[ 59], 50.00th=[ 60], 60.00th=[ 60],
| 70.00th=[ 60], 80.00th=[ 61], 90.00th=[ 62], 95.00th=[ 67],
| 99.00th=[ 91], 99.50th=[ 92], 99.90th=[ 95], 99.95th=[ 98],
| 99.99th=[ 110]
lat (usec) : 20=0.01%, 50=0.01%, 100=99.96%, 250=0.04%, 750=0.01%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=4.58%, sys=6.32%, ctx=1048635, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=64991KB/s, minb=64991KB/s, maxb=64991KB/s, mint=64536msec, maxt=64536msec
Disk stats (read/write):
nvme0n1: ios=1045941/17, merge=0/34, ticks=57687/3, in_queue=57616, util=89.58%
# Konfiguracni soubor SYNC IO RANDOM ACCESS
andromeda /opt # cat random-read-test.fio
; random read of 128mb of data
[random-read]
rw=randread
size=4096m
directory=/opt/fio-test
# ASYNC AIO RANDOM ACCESS
andromeda /opt # fio random-read-test-aio.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.15
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [338.2MB/0KB/0KB /s] [86.6K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=11209: Wed Sep 20 18:17:49 2017
read : io=4096.0MB, bw=329120KB/s, iops=82279, runt= 12744msec
slat (usec): min=2, max=93, avg= 3.17, stdev= 1.73
clat (usec): min=28, max=23455, avg=87.64, stdev=80.48
lat (usec): min=31, max=23458, avg=90.94, stdev=80.50
clat percentiles (usec):
| 1.00th=[ 57], 5.00th=[ 71], 10.00th=[ 73], 20.00th=[ 74],
| 30.00th=[ 76], 40.00th=[ 78], 50.00th=[ 82], 60.00th=[ 88],
| 70.00th=[ 91], 80.00th=[ 95], 90.00th=[ 108], 95.00th=[ 124],
| 99.00th=[ 155], 99.50th=[ 169], 99.90th=[ 209], 99.95th=[ 243],
| 99.99th=[ 2448]
lat (usec) : 50=0.01%, 100=85.59%, 250=14.36%, 500=0.02%, 750=0.01%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=17.83%, sys=36.17%, ctx=507915, majf=0, minf=58
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=329119KB/s, minb=329119KB/s, maxb=329119KB/s, mint=12744msec, maxt=12744msec
Disk stats (read/write):
nvme0n1: ios=1040194/0, merge=0/0, ticks=84617/0, in_queue=84553, util=93.80%
# Konfiguracni soubor ASYNC IO RANDOM ACCESS
andromeda /opt # cat random-read-test-aio.fio
[random-read]
rw=randread
size=4096m
directory=/opt/fio-test
ioengine=libaio
iodepth=8
direct=1
invalidate=1
andromeda /opt #
Takze tady vydime rozdil ~64MB/s synchroniho IO vs ~330MB/s asynchroniho IO.
================
No a nakonec jeste rychly test multithread pristupu, na masince mam 32 threadu (16 fyzickych jader), takze nasimuluju:
4x pametove namapovany dotazovaci enginy
1x aktualizacni thread - simulujici zurnalovani souboroveho systemu
2x background updater - simulujici cteni a zapis najednou nastaveny na 20, resp. 40 microsekund pauzy, ktere simuluje (je to dummy test) nejaky processing dat napr., velikost dat kazdeho threadu je 32, resp. 64mb
A vysledek:
andromeda /opt # fio seven-threads-randio.fio
bgwriter: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
queryA: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryB: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryC: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryD: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
bgupdaterA: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
bgupdaterB: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
fio-2.15
Starting 7 processes
queryC: Laying out IO file(s) (1 file(s) / 4096MB)
queryD: Laying out IO file(s) (1 file(s) / 4096MB)
bgupdaterA: Laying out IO file(s) (1 file(s) / 32MB)
bgupdaterB: Laying out IO file(s) (1 file(s) / 64MB)
Jobs: 1 (f=1): [_(2),r(1),_(4)] [100.0% done] [35323KB/0KB/0KB /s] [8830/0/0 iops] [eta 00m:00s]
bgwriter: (groupid=0, jobs=1): err= 0: pid=11772: Wed Sep 20 18:34:23 2017
write: io=4096.0MB, bw=669910KB/s, iops=167477, runt= 6261msec
slat (usec): min=2, max=63, avg= 4.69, stdev= 2.18
clat (usec): min=18, max=6017, avg=185.43, stdev=35.27
lat (usec): min=23, max=6020, avg=190.23, stdev=35.38
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 171], 10.00th=[ 173], 20.00th=[ 175],
| 30.00th=[ 177], 40.00th=[ 177], 50.00th=[ 181], 60.00th=[ 185],
| 70.00th=[ 191], 80.00th=[ 197], 90.00th=[ 205], 95.00th=[ 213],
| 99.00th=[ 231], 99.50th=[ 239], 99.90th=[ 278], 99.95th=[ 342],
| 99.99th=[ 390]
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=99.79%, 500=0.20%
lat (msec) : 10=0.01%
cpu : usr=21.49%, sys=78.40%, ctx=7, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
queryA: (groupid=0, jobs=1): err= 0: pid=11773: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=41277KB/s, iops=10319, runt=101613msec
clat (usec): min=66, max=5590, avg=92.93, stdev=84.01
lat (usec): min=66, max=5591, avg=92.98, stdev=84.01
clat percentiles (usec):
| 1.00th=[ 72], 5.00th=[ 79], 10.00th=[ 79], 20.00th=[ 80],
| 30.00th=[ 81], 40.00th=[ 82], 50.00th=[ 84], 60.00th=[ 89],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 100], 95.00th=[ 106],
| 99.00th=[ 143], 99.50th=[ 197], 99.90th=[ 1848], 99.95th=[ 2224],
| 99.99th=[ 2576]
lat (usec) : 100=89.32%, 250=10.25%, 500=0.17%, 750=0.06%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=5.49%, sys=8.59%, ctx=1048668, majf=1048576, minf=94
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryB: (groupid=0, jobs=1): err= 0: pid=11774: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=40250KB/s, iops=10062, runt=104207msec
clat (usec): min=17, max=5694, avg=93.18, stdev=84.29
lat (usec): min=17, max=5694, avg=93.21, stdev=84.29
clat percentiles (usec):
| 1.00th=[ 73], 5.00th=[ 78], 10.00th=[ 79], 20.00th=[ 80],
| 30.00th=[ 81], 40.00th=[ 82], 50.00th=[ 85], 60.00th=[ 90],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 101], 95.00th=[ 106],
| 99.00th=[ 141], 99.50th=[ 189], 99.90th=[ 1800], 99.95th=[ 2224],
| 99.99th=[ 2608]
lat (usec) : 20=0.01%, 50=0.01%, 100=87.76%, 250=11.82%, 500=0.16%
lat (usec) : 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=7.93%, sys=8.39%, ctx=1048689, majf=1048576, minf=62
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryC: (groupid=0, jobs=1): err= 0: pid=11775: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=47849KB/s, iops=11962, runt= 87658msec
clat (usec): min=57, max=6160, avg=78.94, stdev=85.48
lat (usec): min=57, max=6160, avg=78.98, stdev=85.49
clat percentiles (usec):
| 1.00th=[ 60], 5.00th=[ 61], 10.00th=[ 62], 20.00th=[ 62],
| 30.00th=[ 63], 40.00th=[ 64], 50.00th=[ 67], 60.00th=[ 78],
| 70.00th=[ 81], 80.00th=[ 85], 90.00th=[ 96], 95.00th=[ 101],
| 99.00th=[ 135], 99.50th=[ 213], 99.90th=[ 1816], 99.95th=[ 2224],
| 99.99th=[ 2576]
lat (usec) : 100=94.35%, 250=5.20%, 500=0.18%, 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=7.62%, sys=9.23%, ctx=1048640, majf=1048576, minf=48
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryD: (groupid=0, jobs=1): err= 0: pid=11776: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=54710KB/s, iops=13677, runt= 76664msec
clat (usec): min=57, max=6988, avg=70.45, stdev=86.49
lat (usec): min=57, max=6988, avg=70.48, stdev=86.49
clat percentiles (usec):
| 1.00th=[ 60], 5.00th=[ 61], 10.00th=[ 61], 20.00th=[ 62],
| 30.00th=[ 62], 40.00th=[ 63], 50.00th=[ 63], 60.00th=[ 64],
| 70.00th=[ 65], 80.00th=[ 66], 90.00th=[ 71], 95.00th=[ 83],
| 99.00th=[ 124], 99.50th=[ 213], 99.90th=[ 1848], 99.95th=[ 2224],
| 99.99th=[ 2544]
lat (usec) : 100=97.17%, 250=2.38%, 500=0.18%, 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=5.58%, sys=10.84%, ctx=1048637, majf=1048576, minf=156
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
bgupdaterA: (groupid=0, jobs=1): err= 0: pid=11777: Wed Sep 20 18:34:23 2017
read : io=16824KB, bw=17955KB/s, iops=4488, runt= 937msec
slat (usec): min=3, max=35, avg= 4.58, stdev= 2.26
clat (usec): min=52, max=3446, avg=160.21, stdev=290.90
lat (usec): min=58, max=3450, avg=165.03, stdev=290.86
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 56], 10.00th=[ 57], 20.00th=[ 58],
| 30.00th=[ 59], 40.00th=[ 62], 50.00th=[ 71], 60.00th=[ 89],
| 70.00th=[ 119], 80.00th=[ 179], 90.00th=[ 310], 95.00th=[ 402],
| 99.00th=[ 1928], 99.50th=[ 2288], 99.90th=[ 3024], 99.95th=[ 3248],
| 99.99th=[ 3440]
write: io=15944KB, bw=17016KB/s, iops=4254, runt= 937msec
slat (usec): min=3, max=48, avg= 5.29, stdev= 2.64
clat (usec): min=4, max=102, avg=13.30, stdev= 4.47
lat (usec): min=15, max=110, avg=18.76, stdev= 5.32
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 11], 10.00th=[ 12], 20.00th=[ 12],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 13], 80.00th=[ 14], 90.00th=[ 14], 95.00th=[ 16],
| 99.00th=[ 37], 99.50th=[ 41], 99.90th=[ 81], 99.95th=[ 97],
| 99.99th=[ 102]
lat (usec) : 10=0.65%, 20=46.45%, 50=1.48%, 100=32.58%, 250=11.44%
lat (usec) : 500=5.52%, 750=0.65%, 1000=0.16%
lat (msec) : 2=0.63%, 4=0.45%
cpu : usr=18.80%, sys=6.41%, ctx=8182, majf=0, minf=8
IO depths : 1=99.8%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=4206/w=3986/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16
bgupdaterB: (groupid=0, jobs=1): err= 0: pid=11778: Wed Sep 20 18:34:23 2017
read : io=32684KB, bw=7351.4KB/s, iops=1837, runt= 4446msec
slat (usec): min=2, max=35, avg= 4.15, stdev= 2.40
clat (usec): min=40, max=5188, avg=440.41, stdev=680.03
lat (usec): min=58, max=5191, avg=444.85, stdev=679.99
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 57], 10.00th=[ 57], 20.00th=[ 59],
| 30.00th=[ 61], 40.00th=[ 71], 50.00th=[ 102], 60.00th=[ 161],
| 70.00th=[ 294], 80.00th=[ 628], 90.00th=[ 1672], 95.00th=[ 2160],
| 99.00th=[ 2512], 99.50th=[ 2640], 99.90th=[ 4384], 99.95th=[ 5088],
| 99.99th=[ 5216]
write: io=32852KB, bw=7389.2KB/s, iops=1847, runt= 4446msec
slat (usec): min=3, max=32, avg= 4.94, stdev= 2.33
clat (usec): min=0, max=109, avg=13.04, stdev= 4.17
lat (usec): min=14, max=116, avg=18.08, stdev= 4.82
clat percentiles (usec):
| 1.00th=[ 10], 5.00th=[ 11], 10.00th=[ 11], 20.00th=[ 12],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 13], 80.00th=[ 13], 90.00th=[ 14], 95.00th=[ 15],
| 99.00th=[ 23], 99.50th=[ 35], 99.90th=[ 87], 99.95th=[ 101],
| 99.99th=[ 109]
lat (usec) : 2=0.01%, 10=0.47%, 20=48.25%, 50=1.32%, 100=24.66%
lat (usec) : 250=8.94%, 500=5.20%, 750=1.76%, 1000=1.34%
lat (msec) : 2=4.72%, 4=3.27%, 10=0.06%
cpu : usr=15.43%, sys=2.25%, ctx=16378, majf=0, minf=9
IO depths : 1=99.9%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=8171/w=8213/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: io=16432MB, aggrb=161474KB/s, minb=7351KB/s, maxb=54710KB/s, mint=937msec, maxt=104207msec
WRITE: io=4143.7MB, aggrb=677703KB/s, minb=7389KB/s, maxb=669909KB/s, mint=937msec, maxt=6261msec
Disk stats (read/write):
nvme0n1: ios=4205750/1060793, merge=0/64, ticks=313136/11710, in_queue=325124, util=99.10%
# Konfiguracni soubor multithreaded testu:
andromeda /opt # cat four-threads-randio.fio
; seven threads, two query, two writers.
[global]
rw=randread
size=4096m
directory=/opt/fio-test
ioengine=libaio
iodepth=4
invalidate=1
direct=1
[bgwriter]
rw=randwrite
iodepth=32
[queryA]
iodepth=1
ioengine=mmap
direct=0
thinktime=3
[queryB]
iodepth=1
ioengine=mmap
direct=0
thinktime=5
[queryC]
iodepth=1
ioengine=mmap
direct=0
thinktime=4
[queryD]
iodepth=1
ioengine=mmap
direct=0
thinktime=2
[bgupdaterA]
rw=randrw
iodepth=16
thinktime=20
size=32m
[bgupdaterB]
rw=randrw
iodepth=16
thinktime=40
size=64m
Asynchroni IO je rozhodne posilou prutoku dat, to plati obecne v jakem koli systemu a na fyzickem IO nejde o vyjimku.
===================================
=== Summary: ===
===================================
Takze s timhle detatkem od Dellu si muzete postavit super rychle pole o velikost 4-8TB (vyssi plati pokud pouzijete 2TB disky nvme, jeden stoji ale ~16000 CZK).
Asi zalezi na pouziti, ale vecicka od delu dava rozhodne k dispozici urcitou variabilitu, kdyz je disk vadnej, tak ho vyhodis, dalsi 3 porad bezi, apod...
Odpovědět2 1
Me neprijde, ze by Vega nejak brutalne zaostavala z pohledu ciste vykonu, kazdopadne z pohledu spotreby/vykonu uz to vypada jinak.
Pracuju s vicero frameworky na neuronove site a jen malo z nich podporuje OpenCL, vetsina je postavena na CUDA, takze i software trhne uz i historicky k Nvidii, ktera poskytovala lepsi base. Urcite drzim palce AMD at uz napr. s projektem HIP (konverze CUDA do pure C++ kodu spustitelnem kdekoli), ci dalsi generaci. At udelaj v GPU to, co ted predvadi v CPU domene.
Jinak clanek prave nezminuje jednu dulezitou vec, kterou karta od Nvidia ma (nebo jsem to prehledl)? A prave pro neuronove site zacaly ruzne firmy delat svoje specificke akceleratory (typicky urychlovani maticovych vypoctu), google ma svuj, intel koupil firmu a vydal USB stick jako urychlovac a budou dalsi. Vtip je ale v tom, ze v clanku neni zminen ten vykon. Rekneme, ze v FP16 ma max vykon karta ~25, tak pokud aplikaci prepisu pro tensor unit, tak se z toho podle specifikaci da vymackat az ~120. Plus pokud se to zkombinuje, tak nejakych skoro 150 :D. A prave tohle je to, na co v tuto chvili AMD nema odpoved.
AMD ma zase neco jineho, primo na GPU se daj nainstalovat M.2 disky jako rychla cache, az 2TB v soucasnosti, takze nemusim data tahat z disku pres infrastrukturu na GPU, to je urcite take nemale zrychleni, jelikoz offload (input/output) na/z GPU je dost nemale zdrzeni. Ale stejne to tensor vykon nevyvazi....
Pracuju s vicero frameworky na neuronove site a jen malo z nich podporuje OpenCL, vetsina je postavena na CUDA, takze i software trhne uz i historicky k Nvidii, ktera poskytovala lepsi base. Urcite drzim palce AMD at uz napr. s projektem HIP (konverze CUDA do pure C++ kodu spustitelnem kdekoli), ci dalsi generaci. At udelaj v GPU to, co ted predvadi v CPU domene.
Jinak clanek prave nezminuje jednu dulezitou vec, kterou karta od Nvidia ma (nebo jsem to prehledl)? A prave pro neuronove site zacaly ruzne firmy delat svoje specificke akceleratory (typicky urychlovani maticovych vypoctu), google ma svuj, intel koupil firmu a vydal USB stick jako urychlovac a budou dalsi. Vtip je ale v tom, ze v clanku neni zminen ten vykon. Rekneme, ze v FP16 ma max vykon karta ~25, tak pokud aplikaci prepisu pro tensor unit, tak se z toho podle specifikaci da vymackat az ~120. Plus pokud se to zkombinuje, tak nejakych skoro 150 :D. A prave tohle je to, na co v tuto chvili AMD nema odpoved.
AMD ma zase neco jineho, primo na GPU se daj nainstalovat M.2 disky jako rychla cache, az 2TB v soucasnosti, takze nemusim data tahat z disku pres infrastrukturu na GPU, to je urcite take nemale zrychleni, jelikoz offload (input/output) na/z GPU je dost nemale zdrzeni. Ale stejne to tensor vykon nevyvazi....
Odpovědět0 0
souhlas, je to bordel jak v tanku.
Odpovědět2 0
Cituji z clanku: "Alienware Area-51 ale stále budou herní počítače, a tak je otázka, zda využití takového 16jádrového Threadripperu bude mít v takové mašině vůbec smysl."
Situace, ze hry nepotrebuji vice jader, coz vyusti v pripady, ze je nektere ani neumeji vyuzivat, je zavinena spise nedostatkem vicejadrovych masinek jako standard. Pokud ma moje hracska zakladna prumer 2-4 jadra, tak si dam sakra pozor, jak a co vyvijim. Ale to neni tim, ze to neumim, ale tim, ze to nemuzu nasadit.
Pokud se zacneme za 2 roky(lucky) bavit o hracskem standardu 10 jader/20 vlaken, tak s narustem jader lze uplatnit ve hrach i dalsi principy, ktere budto nebyly mozne, nebo se do teto doby nejak upravovaly a vykuchavaly.
Jako velmi krasny priklad, ktery se podle me nabizi uplen ciste uvedu napr. moznost vytvoreni nezavislych postav, nebo skupin postav, resp. oddelenych umelych inteligenci. Fundamentalne to do navrhu her prinasi moznost decouplingu, tj. naprosto oddelenych samostatnych svetu, ktere koreluji dohromady. Analogicky mi to pripomina prechod od SOA na micro architektury ve svete sluzeb, tj. hukot.
Situace, ze hry nepotrebuji vice jader, coz vyusti v pripady, ze je nektere ani neumeji vyuzivat, je zavinena spise nedostatkem vicejadrovych masinek jako standard. Pokud ma moje hracska zakladna prumer 2-4 jadra, tak si dam sakra pozor, jak a co vyvijim. Ale to neni tim, ze to neumim, ale tim, ze to nemuzu nasadit.
Pokud se zacneme za 2 roky(lucky) bavit o hracskem standardu 10 jader/20 vlaken, tak s narustem jader lze uplatnit ve hrach i dalsi principy, ktere budto nebyly mozne, nebo se do teto doby nejak upravovaly a vykuchavaly.
Jako velmi krasny priklad, ktery se podle me nabizi uplen ciste uvedu napr. moznost vytvoreni nezavislych postav, nebo skupin postav, resp. oddelenych umelych inteligenci. Fundamentalne to do navrhu her prinasi moznost decouplingu, tj. naprosto oddelenych samostatnych svetu, ktere koreluji dohromady. Analogicky mi to pripomina prechod od SOA na micro architektury ve svete sluzeb, tj. hukot.
Odpovědět1 0
Mam fakt radost, jak to AMD strili ven, tohle mohl Intel delat uz davno, ale neudelal, nebo za premrstene ceny. Abych jako smrtelnik mohl sahnout na high-end chipy, tak jsem kupoval ruzne po Asii ES verze za 10-15% ceny. Ale stejne ty pamrdi to porad oslili a orezavali a ted to sem AMD sere pod tlakem, coz je super. Ja tech 128 vlaken (2 patice 32core verze) na svym kompu vyuziju jak prd.
Jsem zvedavej s cim vyjdou jeste dalsi grafiky, uz ted nova architektura by mela delat 25 teraflopu na FP16, to Nvidia architektura neumi, tak do Volta architektury zacnou pridavat TensorCore. Drtiva vetsina AI frameworku je psana pro CUDA (Nvidia), ale citim, ze to se ted trochu zacne menit, protoze si proste nabusenou kartu pro AI koupis domu za zlomek Tesly.
Jsem zvedavej s cim vyjdou jeste dalsi grafiky, uz ted nova architektura by mela delat 25 teraflopu na FP16, to Nvidia architektura neumi, tak do Volta architektury zacnou pridavat TensorCore. Drtiva vetsina AI frameworku je psana pro CUDA (Nvidia), ale citim, ze to se ted trochu zacne menit, protoze si proste nabusenou kartu pro AI koupis domu za zlomek Tesly.
Odpovědět2 0
Jinak ja zasadne kupuju pres ebay ES verze xeonu uz par let a nejvyssi spokojenost. 16 jadro sezenes od 10-25 tisic. Stavim to na deskach Asus Z9PE nebo ted uz Z10PE a raz,dva mas doma masinku s 64 cpu vlaknama. Ja to nemam na hry no a zajimaj me 24 core verze, ta vyjde na bratru 30000,-- + nejaky clo mozna, ale da se to dovezt i bez cla. No a to uz mas 96 cpu vlaken v double socketu. Jasne cena stoupa, zalezi na vyuziti.... na heureka.cz tenhle chip stoji 120,000, takze to mas za hubicku.
Odpovědět1 0
Musej rejzovat na CPU, aby mohli investovat do AI cipu, nebo pokusit se koupit Nvidi
Odpovědět0 0
Presne!
Odpovědět0 0
Uplne levne scenario: mam dva kompy, kazdy nejake rychle SSD, tak i v takovym scenari je 1Gbit uz limitujici. Ja si davam do kompa vzdycky raid 0, data mam stejne jinde a jde mi jen o vykon. - jasne bavim se o sekvencnim read/write only mode. s jednim ssd mam 4-5x rychlost disku nez sit, v raid 0 se dostavame nekam k 1GB/s, a tam uz se ti vyplati koupit 2 10GBe sitovky a propojit ty masinky rychlejc.
Odpovědět2 0
Souhlas.
Odpovědět0 0
Ale jako celkove to neni za hubicku, to urcite ne...
Odpovědět1 0
Tak zalezi jaky druh domacnosti mas, je to jen o uhlu pohledu
Odpovědět2 0
Tohle jsem jenom testoval podle Multipath TCP experimentu viz link v mojem prispevku. Standardne mam v masince 1x 10GBe dual port, z ktery dostanes ~17-18Gbit a to staci. A jak jsem odpovedel vyse, vetsina jde na Hadoop cluster - RAM a independent disky.
Proto jsme ted taky poridil ty 48 10GBe switche, na ebay ho najdes za 10,000 CZK (z Ameriky, takze plus clo a doprava). Pokud se do toho pustis, tak Multipath muze byt nekdy non-CPU friendly, ale da se vyladit.
Proto jsme ted taky poridil ty 48 10GBe switche, na ebay ho najdes za 10,000 CZK (z Ameriky, takze plus clo a doprava). Pokud se do toho pustis, tak Multipath muze byt nekdy non-CPU friendly, ale da se vyladit.
Odpovědět1 0
Mam dva diskove systemy:
1. 1U nody - kazdy 5x 6TB disku a 144GB ram na kterych mi bezi Spark distribu iovany vypocetni cluster. Load jde rovnou do RAM, pri cteni z disku seqkvencne je tam nejaky top 800MB za vterinu z jednoho nodu (realita je nize). Tyto bezi na Hadoop softwarovem file systemu pouze s double (neni to produkce) replikaci. Nekdy to vypinam a bezim uplne bez replikace, tj. dostanu 2x vice mista. Mam tech nodu 5, tj. double replikace 75TB, bez replikace 150TB raw. Jinak na produkcnich Hadoop clusterech se pouziva minimalne 3x replikace. Ale ja mam data ulozena jinde a velkou cast generuju
2. No a na "klasicky" data mam softwarovej raid 4U server, kde mi bezi 3x pole RAID 6 (kazde 8x 6TB disku). Vsechny 3 pole jsou spojeny vrstvou strip, tj. RAID 0. Teoretickej peak je 1200 MB/s cteni, ale realne jedno pole cte nekde okolo 600-900MB/s, x3 = 2-3GB(ne Gbit)/s - nedelal jsem ale benchmarky. A asi to stejne rozdelim na jedno RAID 6 na kriticka data a dalsi dam jen do RAID 0, tj. RAID60, protoze se mi to zda dost nebezpecne -> jedno pole neobnovim a vsechno je v pici :-). Managuje to mdadm, ale uvazuju, ze to cele, nebo jedno z nich prevedu na BRTFS, kterymu ale porad moc neverim, takze spis kriticky data na ZFS. BRTFS ma hodne cool featury, ale jeste nedospel. Spolecne se ZFS se taky staraj o datovou integritu, kde mdadm se stara jen o integritu pole.
1. 1U nody - kazdy 5x 6TB disku a 144GB ram na kterych mi bezi Spark distribu iovany vypocetni cluster. Load jde rovnou do RAM, pri cteni z disku seqkvencne je tam nejaky top 800MB za vterinu z jednoho nodu (realita je nize). Tyto bezi na Hadoop softwarovem file systemu pouze s double (neni to produkce) replikaci. Nekdy to vypinam a bezim uplne bez replikace, tj. dostanu 2x vice mista. Mam tech nodu 5, tj. double replikace 75TB, bez replikace 150TB raw. Jinak na produkcnich Hadoop clusterech se pouziva minimalne 3x replikace. Ale ja mam data ulozena jinde a velkou cast generuju
2. No a na "klasicky" data mam softwarovej raid 4U server, kde mi bezi 3x pole RAID 6 (kazde 8x 6TB disku). Vsechny 3 pole jsou spojeny vrstvou strip, tj. RAID 0. Teoretickej peak je 1200 MB/s cteni, ale realne jedno pole cte nekde okolo 600-900MB/s, x3 = 2-3GB(ne Gbit)/s - nedelal jsem ale benchmarky. A asi to stejne rozdelim na jedno RAID 6 na kriticka data a dalsi dam jen do RAID 0, tj. RAID60, protoze se mi to zda dost nebezpecne -> jedno pole neobnovim a vsechno je v pici :-). Managuje to mdadm, ale uvazuju, ze to cele, nebo jedno z nich prevedu na BRTFS, kterymu ale porad moc neverim, takze spis kriticky data na ZFS. BRTFS ma hodne cool featury, ale jeste nedospel. Spolecne se ZFS se taky staraj o datovou integritu, kde mdadm se stara jen o integritu pole.
Odpovědět1 0
Tohle urcite uvita nejeden domaci uzivatel, ktery potrebuje propojit nekolik stroju necim vic. Switche jsou nechutne drahe, ale zase pres crosscable a dve levne sitovky a 10gbit sit je na svete.
Ja osobne jsem si takovou sit stavel pred dvema lety doma, ale na SFP+ portech. Sitovky i dual port stoji 1000-3000 CZK. Hlavne ale oproti 10gbit ethernetu jsou na trhu super levny 24, ale i 48 portovy SFP+ switche quanta lb6, resp. lb8. Podporujou jen layer 2/3/4, ale to staci. Hlavne stojej 10,000 CZK, ne pul milionu a to je pro domaciho uzivatele ten pravej LOL.
No a pokud je potreba vetsi datovej pruvan, tak i s jednou dual port sitovkou dostanes rovnou lehce pod 20Gbit pres tcp multipath, jestli chces vic, tak si tam dej vic sitovek, tj. . Samozrejme multipath neni vsemocny a dochazi k urcite dekradaci. Z teoretickeho 60gbitu (3x dualport NIC) dostanes lehce pres 50gbit viz napr. http://multipath-tcp.org/pmwiki.php?n=Main.50Gbps
Presun 50GB za 10 vterin neni spatny vs 8 minut na 1gbitu, a clovek si na to rychle zvykne :-)
Ja osobne jsem si takovou sit stavel pred dvema lety doma, ale na SFP+ portech. Sitovky i dual port stoji 1000-3000 CZK. Hlavne ale oproti 10gbit ethernetu jsou na trhu super levny 24, ale i 48 portovy SFP+ switche quanta lb6, resp. lb8. Podporujou jen layer 2/3/4, ale to staci. Hlavne stojej 10,000 CZK, ne pul milionu a to je pro domaciho uzivatele ten pravej LOL.
No a pokud je potreba vetsi datovej pruvan, tak i s jednou dual port sitovkou dostanes rovnou lehce pod 20Gbit pres tcp multipath, jestli chces vic, tak si tam dej vic sitovek, tj. . Samozrejme multipath neni vsemocny a dochazi k urcite dekradaci. Z teoretickeho 60gbitu (3x dualport NIC) dostanes lehce pres 50gbit viz napr. http://multipath-tcp.org/pmwiki.php?n=Main.50Gbps
Presun 50GB za 10 vterin neni spatny vs 8 minut na 1gbitu, a clovek si na to rychle zvykne :-)
Odpovědět2 2
Souhlasim, taky neznam firmu, ktera by mela 10gbit jako standard ethernet v kancelarich. Ale kdyz budou 10gbe sitovky v zarizenich by default, urcite to nejak na trh zapusobi a potencionalne prinese na trh levnejsi 10gbe switche.
Serverovny to je neco jineho, podle potreby lze skalovat i pres 100gbit na multipath TCP. Otazka je spis ohledne ethernetu jako takoveho. Ma dost velkou latency pro nektere druhy pouziti a velde toho tu mame overenej infiniband and fiberchannel...
Celkove to urcite zbytecne neni, jen zalezi na velikosti dat, s kteryma pracujes. A argument typu, ze velky data patrej na server neberu. Pro AI muze mit datovej set treba 1 nebo taky 10TB, ale k sestaveni muze dojit na pracovni stanici a fakt nemam problem postavit dnes nejakej jednoduchej raid ve workstation z 8TB/10TB disku. Kdyz to pak ladujes na server a zpet abys to doladil, tak proste 1Gbit vs 10Gbit je znat, ja si napr. postavil doma levnej 60Gbit. Tvoje predstava zbytecnosti vychazi jen z tvoji zkusenosti, ktera je nutne omezena.
Ale souhlasim, v soucasnosti je to ukrutne drahe!
Serverovny to je neco jineho, podle potreby lze skalovat i pres 100gbit na multipath TCP. Otazka je spis ohledne ethernetu jako takoveho. Ma dost velkou latency pro nektere druhy pouziti a velde toho tu mame overenej infiniband and fiberchannel...
Celkove to urcite zbytecne neni, jen zalezi na velikosti dat, s kteryma pracujes. A argument typu, ze velky data patrej na server neberu. Pro AI muze mit datovej set treba 1 nebo taky 10TB, ale k sestaveni muze dojit na pracovni stanici a fakt nemam problem postavit dnes nejakej jednoduchej raid ve workstation z 8TB/10TB disku. Kdyz to pak ladujes na server a zpet abys to doladil, tak proste 1Gbit vs 10Gbit je znat, ja si napr. postavil doma levnej 60Gbit. Tvoje predstava zbytecnosti vychazi jen z tvoji zkusenosti, ktera je nutne omezena.
Ale souhlasim, v soucasnosti je to ukrutne drahe!
Odpovědět2 0