Note - Berkely gdbm does not have this problem. ---------- Forwarded message ---------- Date: Mon, 1 Nov 2004 16:47:05 -0400 (AST) From: G.Keith Slade To: Mike Shepherd Subject: GDBM Solution I found out that GDBM has massive overhead if you are doing file updates here and there. I had about 500 megs of files before i figured this out. The best way to use GDBM is not to tie until the end of the file and then copy over from your hash tables you were using during the perl code. This reduced my GDBM files from being more then 700 megs down to about 40 megs. tie (%xinverted, "GDBM_File", "inverted.gdbm", &GDBM_WRCREAT, 0644); tie (%xfrequency, "GDBM_File", "frequency.gdbm", &GDBM_WRCREAT, 0644); tie(%xdocvec,"GDBM_File", "docvec.gdbm", &GDBM_WRCREAT, 0644); tie(%xdocumentIndex ,"GDBM_File", "docIndex.gdbm", &GDBM_WRCREAT, 0644); foreach (keys %inverted){ $xinverted{$_}=$inverted{$_}; } foreach (keys %frequency){ $xfrequency{$_}=$frequency{$_}; } foreach (keys %docvec){ $xdocvec{$_}=$docvec{$_}; } foreach (keys %documentIndex){ $xdocumentIndex{$_}=$documentIndex{$_}; } ****************** G. Keith Slade, MCS Candidate "If those employees are like most employees, they've been making personal phone calls on company time, stealing office supplies, fudging expense reports, lying about their accomplishments and using sick days for vacations. Compare that to the executives who allegedly stole hundreds of millions of dollars. The philosophical question to consider is this: Are the executives LESS honest or just MORE effective?" ~Scott Adams~ with regards to Enron