I am trying to retrofit an application I wrote, so it will use SQLObject.
The application, SearchLog <
http://jonathanscorner.com/etc/searchlog/>,
presently loads a number of documents, keeps track of what keywords occur
and how many, and currently implements a Boolean keyword search
functionality with wildcard and phrase support (although
Boolean/wildcard/phrase searching is not a high priority now), returning
documents sorted by the user's choice of last programatically-updated view,
last modification, or a relevance score that is dependent on keywords'
proportion in the document (the relevance is not a high priority now).
At present, it's a memory hog and runs slowly with a few hundred documents,
so I'd like to retrofit some of the basic functionality and possibly the
"optional" functionality so that it can run quickly and without per-document
memory overhead.
I have the following two classes:
class database_document(sqlobject.SQLObject):
date_last_modified = sqlobject.DateCol()
date_last_viewed = sqlobject.DateCol()
#excerpt = sqlobject.StringCol()
relative_filename = sqlobject.StringCol()
section = sqlobject.StringCol()
text = sqlobject.StringCol()
tokenized = sqlobject.StringCol()
def _set_tokenized(self, token_list):
value = " ".join(token_list)
self._SO_set_tokenized(value)
class database_document_keyword(sqlobject.SQLObject)
document_filename = sqlobject.StringCol()
document_id = sqlobject.ForeignKey('database_document')
word = sqlobject.StringCol()
frequency = sqlobject.IntCol()
total = sqlobject.IntCol()
def _get_proportion(self):
return float(self.frequency) / float(max(1, self.total))
The database_document_keyword is intended as an optimization to circumvent
the inefficiency of SQL searching for "% foo %" in the tokenized form.
Given those classes, which should be straightforward to generate and
populate, what is the best way to say "Give me all database_documents
containing all of the following keywords, sorted by this timestamp."?
--
++ Jonathan Hayward, jonathan.hayward@???
** To see an award-winning website with stories, essays, artwork,
** games, and a four-dimensional maze, why not visit my home page?
** All of this is waiting for you at
http://JonathansCorner.com
** If you'd like a Google Mail (gmail.com <
http://gmail.com>) account,
please tell me!