-

@ mleku
2025-05-12 09:29:49
#realy #devstr #progressreport
i have now completed drafting the indexing part of the full text reverse index, which assembles lists of the appearance of a given word, extracted using a unicode tokenizer, from every event in the database, as a list of the database serials in the value field of a key that includes the name and a counter that tells how many references a given word has, without having to decode the whole list of event serials in the database.
need to integrate it into the indexing process yet as well, probably need to separate the part of the process that generates the relative frequency table out so it is just done on a ticker every 10 minutes or so as this data doesn't need to be perfectly precise, as each newly added event will take hardly any time to update the reverse index, it doesn't really make sense to trigger the recalculation of the relative frequencies so often, as new events won't alter the historical proportions of occurrance by very much and not greatly affect the relevance sorting.