Cine mi-a mancat memoria?

Marius '95

troubleShooter
Help!
Copiez niste fisiere pe router de pe un disc pe altul. La un moment dat ingheata tot si isi da singur restart. Dupa mai multe incercari, am "prins" un moment fix inainte de crash si am reusit sa-l "salvez"... oarecum...
Situatia se prezinta asa:

1661972319522.png


Code:
root@GRAPHRT:/proc# cat meminfo
MemTotal:         247724 kB
MemFree:           25848 kB
MemAvailable:      14240 kB
Buffers:               8 kB
Cached:            22472 kB
SwapCached:            0 kB
Active:            11924 kB
Inactive:          13484 kB
Active(anon):       1048 kB
Inactive(anon):     3140 kB
Active(file):      10876 kB
Inactive(file):    10344 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          2928 kB
Mapped:             3048 kB
Shmem:              1260 kB
KReclaimable:       4712 kB
Slab:              29496 kB
SReclaimable:       4712 kB
SUnreclaim:        24784 kB
KernelStack:        1448 kB
PageTables:          256 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      123860 kB
Committed_AS:       7012 kB
VmallocTotal:     770048 kB
VmallocUsed:        9952 kB
VmallocChunk:          0 kB
Percpu:              240 kB
root@GRAPHRT:/proc# cat vmstat
nr_free_pages 6462
nr_zone_inactive_anon 786
nr_zone_active_anon 262
nr_zone_inactive_file 2587
nr_zone_active_file 2719
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_page_table_pages 64
nr_bounce 0
nr_zspages 0
nr_free_cma 0
nr_inactive_anon 786
nr_active_anon 262
nr_inactive_file 2587
nr_active_file 2719
nr_unevictable 0
nr_slab_reclaimable 1178
nr_slab_unreclaimable 6179
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 2377
workingset_refault_anon 393
workingset_refault_file 699590
workingset_activate_anon 87
workingset_activate_file 87404
workingset_restore_anon 0
workingset_restore_file 42089
workingset_nodereclaim 0
nr_anon_pages 734
nr_mapped 762
nr_file_pages 5620
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 315
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_vmscan_write 28826
nr_vmscan_immediate_reclaim 65582
nr_dirtied 329138
nr_written 306115
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 0
nr_foll_pin_released 0
nr_kernel_stack 1448
nr_dirty_threshold 1124
nr_dirty_background_threshold 561
nr_unstable 0

Deci unde s-au dus 190 MB de memorie?
 
Code:
root@GRAPHRT:/proc# cat vmallocinfo
[...]
0x1bc111ab-0x7448241e  131072 0xc0394ee8 pages=31 vmalloc
0x7448241e-0xb9c30497  131072 0xc038d4e0 pages=31 vmalloc
0x47c1d180-0xec46d4d4 1052672 0xc0565a3c phys=0xf8200000 ioremap
0x3f8861ac-0x03276075 1052672 0xc0565a3c phys=0xf8300000 ioremap
0x497797d6-0x183b3047 1052672 0xc0565a3c phys=0xf8400000 ioremap
0x6f3e7a30-0xd6da58d4 1052672 0xc0565a3c phys=0xf8500000 ioremap
0x4c119a47-0xe18ac24b 2613248 0xc0447b9c pages=637 vmalloc
0xc7763f7c-0x28c05fb1 2097152 0xc0c093e0 ioremap
0x1d0b10c3-0xf91fb7c7   90112 0xc0211e44 vmalloc
0xf91fb7c7-0x74dc1cda   90112 0xc0211e44 vmalloc
0xa902ece0-0xb461dd7b   16384 unpurged vm_area
0xb461dd7b-0xd44196a5   16384 unpurged vm_area
0x479ebb6f-0x4c119a47 2613248 unpurged vm_area
 
OS-ul de pe router e pus 100% de tine? Să nu aibă vreun watchdog care să-i dea restart când ajunge memoria la un anumit procent, sau când începe să se miște greu. Pentru că în mod normal atunci când linux-ul ajunge să nu mai aibă memorie, merge din ce în ce mai greu și mai omoară din chestii neinteresante, nu se restartează imediat. Bine, în mod normal are swap, ceea ce la tine pare disabled.

Iar htop îți ascunde kernel threads by default. Apasă K (shift-k) sau F2 / display settings / uncheck hide kernel threads.
 
Kernel threads nu afiseaza memoria ocupata.
Da, exista un watchdog hardware. Ala reseta sistemul cand incepea sa faca swap in draci. Am dezactivat swap-ul si asa am reusit sa-l opresc inainte sa oom-kill sh-ul consolei sau init-ul.
 
Alta:
Code:
[92965.414528][    C0] nfsd: page allocation failure: order:2, mode:0x40820(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
[92965.424816][    C0] CPU: 0 PID: 5007 Comm: nfsd Tainted: G        W  O       6.6.61 #0
[92965.424826][    C0] Hardware name: Marvell Armada 380/385 (Device Tree)
[92965.424834][    C0] [<c010f0b8>] (unwind_backtrace) from [<c010a6d8>] (show_stack+0x10/0x28)
...
[92965.425017][    C0] Exception stack(0xf2b41a70 to 0xf2b41ab8)
[92965.425022][    C0] 1a60:                                     00000001 c2eb9080 00000418 eefd1dc0
[92965.425029][    C0] 1a80: ef620900 00000000 0000065f 00031448 ef620900 00000000 00000001 f2b41abc
[92965.425034][    C0] 1aa0: 00000001 f2b41abc c022bd80 c015ec00 60000013 ffffffff
[92965.425038][    C0] [<c0100bd8>] (__irq_svc) from [<c015ec00>] (migrate_disable+0x44/0x80)
[92965.425054][    C0] [<c015ec00>] (migrate_disable) from [<c022bd80>] (__kmap_local_pfn_prot+0x14/0x134)
[92965.425072][    C0] [<c022bd80>] (__kmap_local_pfn_prot) from [<c0262504>] (__kernel_unpoison_pages+0x58/0x188)
[92965.425085][    C0] [<c0262504>] (__kernel_unpoison_pages) from [<c024d1c0>] (post_alloc_hook+0xbc/0x154)
[92965.425094][    C0] [<c024d1c0>] (post_alloc_hook) from [<c024ef5c>] (get_page_from_freelist+0x468/0xbdc)
[92965.425104][    C0] [<c024ef5c>] (get_page_from_freelist) from [<c024fcb0>] (__alloc_pages+0x130/0xbd8)
[92965.425113][    C0] [<c024fcb0>] (__alloc_pages) from [<c01f5180>] (__filemap_get_folio+0xd8/0x3c0)
[92965.425124][    C0] [<c01f5180>] (__filemap_get_folio) from [<c0335c80>] (ext4_write_begin+0xa0/0x568)
[92965.425138][    C0] [<c0335c80>] (ext4_write_begin) from [<c01f05d0>] (generic_perform_write+0xcc/0x240)
[92965.425156][    C0] [<c01f05d0>] (generic_perform_write) from [<c031f774>] (ext4_buffered_write_iter+0x60/0x13c)
[92965.425177][    C0] [<c031f774>] (ext4_buffered_write_iter) from [<c0272a64>] (do_iter_readv_writev+0xdc/0x160)
[92965.425190][    C0] [<c0272a64>] (do_iter_readv_writev) from [<c0273c98>] (do_iter_write+0x88/0x238)
[92965.425199][    C0] [<c0273c98>] (do_iter_write) from [<c0409940>] (nfsd_vfs_write+0x154/0x4b0)
[92965.425216][    C0] [<c0409940>] (nfsd_vfs_write) from [<c0409e2c>] (nfsd_write+0x88/0xd0)
[92965.425230][    C0] [<c0409e2c>] (nfsd_write) from [<c041409c>] (nfsd3_proc_write+0xd0/0x134)
[92965.425244][    C0] [<c041409c>] (nfsd3_proc_write) from [<c0404a54>] (nfsd_dispatch+0xf4/0x170)
[92965.425256][    C0] [<c0404a54>] (nfsd_dispatch) from [<c0b44ca4>] (svc_process_common+0x390/0x5c4)
[92965.425268][    C0] [<c0b44ca4>] (svc_process_common) from [<c0b45538>] (svc_process+0xdc/0x144)
[92965.425278][    C0] [<c0b45538>] (svc_process) from [<c040320c>] (nfsd+0x78/0xd8)
[92965.425288][    C0] [<c040320c>] (nfsd) from [<c0158854>] (kthread+0xd0/0xec)
[92965.425302][    C0] [<c0158854>] (kthread) from [<c010014c>] (ret_from_fork+0x14/0x28)
[92965.425312][    C0] Exception stack(0xf2b41fb0 to 0xf2b41ff8)
[92965.425316][    C0] 1fa0:                                     00000000 00000000 00000000 00000000
[92965.425322][    C0] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[92965.425327][    C0] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[92965.425331][    C0] Mem-Info:
[92965.425335][    C0] active_anon:1773 inactive_anon:2210 isolated_anon:0
[92965.425335][    C0]  active_file:322301 inactive_file:121208 isolated_file:0
[92965.425335][    C0]  unevictable:0 dirty:649 writeback:0
[92965.425335][    C0]  slab_reclaimable:8232 slab_unreclaimable:19749
[92965.425335][    C0]  mapped:4620 shmem:275 pagetables:267
[92965.425335][    C0]  sec_pagetables:0 bounce:0
[92965.425335][    C0]  kernel_misc_reclaimable:0
[92965.425335][    C0]  free:19131 free_pcp:981 free_cma:0
[92965.425348][    C0] Node 0 active_anon:7092kB inactive_anon:8840kB active_file:1289204kB inactive_file:484832kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:18480kB dirty:2596kB writeback:0kB shmem:1100kB writeback_tmp:0kB kernel_stack:1936kB pagetables:1068kB sec_pagetables:0kB all_unreclaimable? no
[92965.425359][    C0] Normal free:68568kB boost:36864kB min:53248kB low:57344kB high:61440kB reserved_highatomic:16384KB active_anon:16kB inactive_anon:16kB active_file:42380kB inactive_file:480284kB unevictable:0kB writepending:668kB present:786432kB managed:753904kB mlocked:0kB bounce:0kB free_pcp:3924kB local_pcp:568kB free_cma:0kB
[92965.425372][    C0] lowmem_reserve[]: 0 10240 10240
[92965.425382][    C0] Normal: 13226*4kB (UMEH) 1926*8kB (UMEH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 68312kB
[92965.425417][    C0] 444619 total pagecache pages
[92965.425420][    C0] 772 pages in swap cache
[92965.425423][    C0] Free swap  = 1032012kB
[92965.425425][    C0] Total swap = 1048572kB
[92965.425428][    C0] 524288 pages RAM
[92965.425431][    C0] 327680 pages HighMem/MovableOnly
[92965.425433][    C0] 8132 pages reserved
[92965.425440][    C0] mvneta f1030000.ethernet eth1: Linux processing - Can't refill
[92965.433058][    C0] mvneta f1030000.ethernet eth1: bad rx status 0cc02000 (crc error), size=1528
Un router cu 2GB de memorie n-a gasit 16KB liberi consecutivi ca sa primeasca in ei un pachet NFS de 1528 bytes (da, am jumbo frames functional... ma rog... aparent doar semi-functional).
De ce nu compacteaza 4 pagini din alea 13226 pagini libere de 4KB?
De ce nu elibereaza din aia 480MB de cache neutilizat?
De ce nu face swap in fisierul ala de 1GB?
 
Last edited:
Memoria e împărțită în bucăți; page allocation failure nu înseamnă neapărat 0 free, ci poate să însemne că din zona de memorie pe care o păstrează "chiar free" pentru alocări, s-a terminat memoria. Există procese de kernel care fac management-ul memoriei prin trecerea de la cache la free etc, dar dacă procesele tale au nevoie d emultă memorie dintr-o dată, procesul de eliberare nu e instant și nu e sincron cu call-ul tău te malloc (să se aplice logica "daca nu mai am, iau din cache"). Logica aia se întâmplă asincron în spate, în alt thread. E posibil să se întâmple dacă viteza cu care primește pachete pe rețea e mai mare decât viteza cu care le poate scrie mai departe.

Vezi min_free_kbytes și zone_reclaim_mode, plus tot restul de https://www.kernel.org/doc/Documentation/sysctl/vm.txt
 
Kernel threads nu afiseaza memoria ocupata.
Da, exista un watchdog hardware. Ala reseta sistemul cand incepea sa faca swap in draci. Am dezactivat swap-ul si asa am reusit sa-l opresc inainte sa oom-kill sh-ul consolei sau init-ul.
De unde tragem concluzia ca daca te joci cu kernel si ii dezactivezi cache vei avea un comportament ciudat. Asta am aflat-o cam cand reinstalam win 3.1 si niciodata nu a dat gres de atunci, nu conteaza care kernel sau sistem de operare.
Asta cam indiferent de ce se spune, de cum ar trebui sa mearga, cine plange, ce hardware, daca ploua afara sau nu.
 
Back
Top