Tags: ,
I really like it when Python makes it easy for me to do something - like change my catalog functionality to take a file-like object rather than a string and it'll take care of it (iteratively). Now I can hand the catalog a large local file without worrying about memory (all that much), and I can also hand it a urllib.urlopen'd file. I've indexed about 25 megabytes of content via the web, and that's currently taking 25 megabytes to store (in a ZODB FileStorage). That's not particularly good, even if I'm currently indexing stop/common words. The search does take less than a second though, and that's the important part for now. It's designed for web log entries, after all. But I'll look into it. Also need to involve mxTidy (if available) to take care of files that seem to hang my StrippingParser...
The past long weekends and public holidays has allowed me to start writing some code again, and nope (Neil's Object Publishing Experiment) is the codename for what I've been playing with. It uses the Twisted/ZODB integration from The Shuttleworth Foundation's SchoolTool (headed by Zope developer Steve Alexander) to provide a really simple object publishing environment, stealing many ideas/concepts from Zope 3.