from HTYP, the free directory anyone can edit if they can prove to me that they're not a spambot
< Nextdoor‎ | tech
Jump to navigation Jump to search

Some notes[1] from a friend who successfully reverse-engineered a Firefox plugin with the ability to read a post, all of its comments, and a full set of details on each:

I am just running my brain in circles thinking way too hard and conceptually overengineering how to do the query retrieval and caching... like, whether to provide the URLs of the Nextdoor pages for the queries I'll need and [whether] it...

  • extracts the mapping from those to chunk names from the ./apps/app.jsx in the app chunk and fetches each of those chunks at that time and updates the gql cache based on the modules parsed out of each chunk, or maybe instead I
  • have it initially do a run through downloading all the chunks for everything and store an index of what modules are in each chunk and one of which queries are defined by which modules, and of those modules that define queries which modules those modules import and which queries each query defined within references, so that when I ask it for a given query it determines what module its defined in, and what chunk that module is from, and if the cached module contents of that chunk havent been accessed yet this run then it checks if the version of that chunk on the server is still at the last recorded URL for it...
    • and if it is not, then if the contents of the chunk url mapping cache has not been accessed this session, it checks if the version of the "runtime" chunk on the server has updated since last being retrieved using the URL retrieved from the html...
      • and if the URL has changed or the hash is different extracts the new content from it and uses that content to update the url mapping, and then gets the new url for the original chunk, and then it retrieves that chunk from the server and if the hash is different from the last value it parses the module definitions out that chunk and updates them, and if the hash of the module contents have been updated since the query was last accessed then it parses the module contents and uses that to update the query, and then goes through the list of modules imported by the module that query is in and updates each by the same process if needed and then searches each for each of the definitions referenced by the query and finds them and uses the contents of the query and the references queries and their references recursively updating everything by the same process to produce the final query.

...but then that seems maybe overcomplicated and a waste of effort when I could just manually get the needed queries and save them locally, even though that's less robust to API changes, given how much less complex it is

She eventually decided to hardcode the queries, but a few weeks later the API did in fact change, which required extension modifications to keep it working.

She also noted[2] that the "local database of the webpack content" is about 142 MiB, although the raw code is only about half that.


  1. 2022-02-10 private discussion on Discord
  2. 2022-02-12 private discussion on Discord