Hi, bug 35518 contains an email with a 11+MB attachment. Viewing this in the classic Debbugs web interface takes around 7 seconds: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518 On mumi this times out. I benchmarked just the parsing step for all emails in that bug log and I see around 0.1s used by my pre-processing step (splitting the log at control characters) and around 8 seconds for parse-email. With all the slow, unoptimized post processing that I’m doing to render things we hit around 14 seconds for this issue. I wonder if parse-email could possibly be made even faster. -- Ricardo -- guile-email mailing list guile-email@systemreboot.net https://lists.systemreboot.net/listinfo/guile-email
[-- Attachment #1.1: Type: text/plain, Size: 414 bytes --] Hi, Yes, guile-email is quite slow. I think it is because it does multiple passes of each email trying to split MIME entities and what not. For a long time, I've been meaning to rewrite guile-email to only do a single pass. It's quite a bit of work, but I'll try to finish it in a week. Meanwhile, if you have any benchmarking scripts, do share those. It would help me get a head start on the problem. Thanks! [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] [-- Attachment #2: Type: text/plain, Size: 110 bytes --] -- guile-email mailing list guile-email@systemreboot.net https://lists.systemreboot.net/listinfo/guile-email
[-- Attachment #1: Type: text/plain, Size: 414 bytes --] Hi, Yes, guile-email is quite slow. I think it is because it does multiple passes of each email trying to split MIME entities and what not. For a long time, I've been meaning to rewrite guile-email to only do a single pass. It's quite a bit of work, but I'll try to finish it in a week. Meanwhile, if you have any benchmarking scripts, do share those. It would help me get a head start on the problem. Thanks! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --]
Hi Arun, > Yes, guile-email is quite slow. I think it is because it does multiple > passes of each email trying to split MIME entities and what not. For a > long time, I've been meaning to rewrite guile-email to only do a single > pass. It's quite a bit of work, but I'll try to finish it in a week. Wow, I appreciate your work! > Meanwhile, if you have any benchmarking scripts, do share those. It > would help me get a head start on the problem. All I did was wrapping the parse-email call for the extracted email from that Debbugs log file in statprof to see what took the most time. Since that one email is pretty big it took a few seconds to complete. You can get the mbox here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes -- Ricardo
[-- Attachment #1: Type: text/plain, Size: 330 bytes --] > All I did was wrapping the parse-email call for the extracted email from > that Debbugs log file in statprof to see what took the most time. Since > that one email is pretty big it took a few seconds to complete. > > You can get the mbox here: > > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes Ok, thanks! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --]
[-- Attachment #1.1: Type: text/plain, Size: 330 bytes --] > All I did was wrapping the parse-email call for the extracted email from > that Debbugs log file in statprof to see what took the most time. Since > that one email is pretty big it took a few seconds to complete. > > You can get the mbox here: > > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes Ok, thanks! [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] [-- Attachment #2: Type: text/plain, Size: 110 bytes --] -- guile-email mailing list guile-email@systemreboot.net https://lists.systemreboot.net/listinfo/guile-email
[-- Attachment #1.1: Type: text/plain, Size: 968 bytes --] Hi, I've made some improvements. I got the following snippet down from around 16s to 6s. --8<---------------cut here---------------start------------->8--- (statprof (lambda () (map parse-email (call-with-input-file "large-base64-attachment.mbox" mbox->emails)))) --8<---------------cut here---------------end--------------->8--- Specifically, I made the following improvements to guile-email. - I rewrote the base64 decoder from scratch to be a little faster. Earlier, I was using the decoder I had copied from Guix. - I eliminated several unneccessary bytevector<->string conversions. - I rewrote read-bytes-till in (email utils) to process multiple bytes at a time, instead of byte by byte. There is still scope for improvement, but do test and let me know if this serves your purpose for now. Also, I am curious to see your new benchmark of parse-email to compare with the 8s you reported earlier. Cheers! [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] [-- Attachment #2: Type: text/plain, Size: 110 bytes --] -- guile-email mailing list guile-email@systemreboot.net https://lists.systemreboot.net/listinfo/guile-email
[-- Attachment #1: Type: text/plain, Size: 968 bytes --] Hi, I've made some improvements. I got the following snippet down from around 16s to 6s. --8<---------------cut here---------------start------------->8--- (statprof (lambda () (map parse-email (call-with-input-file "large-base64-attachment.mbox" mbox->emails)))) --8<---------------cut here---------------end--------------->8--- Specifically, I made the following improvements to guile-email. - I rewrote the base64 decoder from scratch to be a little faster. Earlier, I was using the decoder I had copied from Guix. - I eliminated several unneccessary bytevector<->string conversions. - I rewrote read-bytes-till in (email utils) to process multiple bytes at a time, instead of byte by byte. There is still scope for improvement, but do test and let me know if this serves your purpose for now. Also, I am curious to see your new benchmark of parse-email to compare with the 8s you reported earlier. Cheers! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --]
[-- Attachment #1.1: Type: text/plain, Size: 427 bytes --] > --8<---------------cut here---------------start------------->8--- > (statprof (lambda () > (map parse-email > (call-with-input-file "large-base64-attachment.mbox" > mbox->emails)))) > --8<---------------cut here---------------end--------------->8--- I forgot to mention. The "large-base64-attachment.mbox" in this snippet is the mbox of bug 35518 that you mentioned earlier. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] [-- Attachment #2: Type: text/plain, Size: 110 bytes --] -- guile-email mailing list guile-email@systemreboot.net https://lists.systemreboot.net/listinfo/guile-email
[-- Attachment #1: Type: text/plain, Size: 427 bytes --] > --8<---------------cut here---------------start------------->8--- > (statprof (lambda () > (map parse-email > (call-with-input-file "large-base64-attachment.mbox" > mbox->emails)))) > --8<---------------cut here---------------end--------------->8--- I forgot to mention. The "large-base64-attachment.mbox" in this snippet is the mbox of bug 35518 that you mentioned earlier. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --]