notjoemartinez's comments

notjoemartinez · 2025-04-13T17:16:23 1744564583

Incident metadata and transcripts are stored in a PostgreSQL db indefinitely. We've logged over 250k incidents since January. Planning on combining the incident data with other sources for historical analysis, search and such.

mingos_ · 2025-04-13T19:06:20 1744571180

thank you

I like that you're adding historical analysis - it reminds me of some similar ideas I've seen in the past, like the opportunity atlas (https://www.opportunityatlas.org/) or the roadway report (https://roadway.report/beta).

you could also add colored markers to filter by incident type, frequency, etc.

notjoemartinez · on Sept 20, 2024

If you email me the police department I you're curious about I can see if I can add it to the supported departments.

notjoemartinez · on Aug 23, 2024

> are you doing this real time or in a batch mode/daily scrape?

I'm transcribing the previous day starting 1AM every day, which is about as much as my old MacBook I have can handle. I'm in the process of applying for the Broadcastify Calls API which should give me access to real time feeds but it will be a challenge to get the compute needed to handle real time transcriptions.

> personally it would be useful to maybe subscribe to key words and get notifications for specific cities

This is a feature I have planned. It will come with a major disclaimer about whisper transcript hallucinations.

Edit: spelling

matteason · on Aug 23, 2024

> I'm in the process of applying for the Broadcastify Calls API which should give me access to real time feeds

I'd be really interested to hear how you get on with this - I've been wanting to add these kinds of feeds to https://ambiph.one but it looks like they're not issuing new licenses for the feeds and the Calls API looks like it's write-only from the docs?

BandButcher · on Aug 23, 2024

excellent, keep it up

notjoemartinez · on April 1, 2024

Location: San Antonio, TX

Remote: Yes

Willing to Relocate: Yes

Technologies: Vue.js, Flask, Node.js, Express.js, PostgreSQL, SQLite, ChromaDB, Docker, Nginx, Windows Presentation Foundation // Infrastructure: AWS (EC2, Cloudfront, S3, Lambda), and Cloudflare (Workers, Pages, R2, KV) // Libraries: OpenCV, scikit-learn, LangChain, TensorFlow, Pandas, Matplotlib, and ChartJS // Languages: Python, JavaScript, C/C++, Bash, Java, C#, HTML, CSS.

Résumé/CV: https://notjoemartinez.com/about/resume.pdf

Email: notjoemartinez@protonmail.com

---

I am a software engineer with over 5 years of experience in full-stack development, a background in embedded systems, computer vision, and data visualization. I am currently seeking internship or full-time employment opportunities.

GitHub: https://github.com/NotJoeMartinez

Website: https://notjoemartinez.com

notjoemartinez · on Dec 27, 2023

I've been trying to solve a problem with implementing semantic search on my YouTube search engine yt-fts (https://github.com/NotJoeMartinez/yt-fts). I've managed to substantially speed up search results by storing subtitle embeddings in Chroma. But a bigger problem has been with how to properly segment the text in a way that accounts for the duration and context of word embeddings while returning precise time stamps. This a blog post exploring what I've tried so far.

dceddia · on Dec 30, 2023

Interesting problem and a good write up! This reminds me a little of audio processing where there are 2 representations, time domain and frequency domain. I’m no expert at this but my understanding is if you want to search for “when” some chunk of audio happened, you first need to convert to the frequency domain via Fourier transfer. But then you lose the time info. So you can’t just take the Fourier transform of the whole file, or even 10 second chunks… you have to take a bunch of short overlapping Fourier transforms – overlapping so you get the nearby context, and short so that you have a higher resolution idea of when something occurred.

I wonder if a similar idea would work here, where you could search at various “zoom levels” - first search for an entire video that’s nearby in terms of embedding, then search within 50%-overlapped 60-second chunks, then within 50%-overlapped 1-second chunks.

notjoemartinez · on June 1, 2023

Thank you! I was able to integrate this into the project[1]. I'm also looking into using your openai-to-sqlite[2] library for semantic search.

[1]https://github.com/NotJoeMartinez/yt-fts/pull/25 [2]https://github.com/simonw/openai-to-sqlite

notjoemartinez · on June 1, 2023

> yt-fts search "love" --channel "Lex Fridman" | grep "love" | wc -l

> 7060

noman-land · on June 2, 2023

Update, I couldn't get this to work. It returns 0 for me.

Running it without the grep etc says channel not found.

noman-land · on June 3, 2023

Update, I had to download all the video subtitles and then query the sqlite tables directly.

Below are the top 10 from just the podcast.

https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuK...

First column is "love"s per episode.

  Total "love" count in 376 episodes = 7,614
  Average "love"s per episode        = 20.25

  -----------------

  107 | Sarma Melngailis: Bad Vegan, Fraud, Pris | iZjby1LkTWQ
   98 | Andrew Huberman: Focus, Stress, Relation | lvh3g7eszVQ
   92 | Bishop Robert Barron: Christianity and t | WgytXF0SPh0
   80 | David Buss: Sex, Dating, Relationships,  | sndW9hzX-wA
   79 | Duncan Trussell: Comedy, Sentient Robots | jdIyNMkusLE
   76 | Rana el Kaliouby: Emotion AI, Social Rob | 36_rM7wpN5A
   75 | Edward Frenkel: Reality is a Paradox - M | Osh0-J3T2nY
   75 | Todd Howard: Skyrim, Elder Scrolls 6, Fa | H9AAnV59ddE
   74 | Travis Oliphant: NumPy, SciPy, Anaconda, | gFEE3w7F0ww
   74 | Kelsi Sheren: War, Artillery, PTSD, and  | PbN3HzKkW4M

  -----------------

  SELECT count(s.video_id) AS love_count, substr(v.video_title, 1, 40), s.video_id
  FROM Subtitles s, Videos v
  WHERE s.video_id = v.video_id
  AND s.video_id IN (
    SELECT v.video_id FROM Videos v
    WHERE v.video_title LIKE "%Podcast%"
    AND v.video_title NOT LIKE "%Podcast Clips%"
  )
  AND s.text LIKE "%love%"
  GROUP BY s.video_id
  ORDER BY love_count DESC
  LIMIT 10

noman-land · on June 2, 2023

This comment blew my mind.

lopatin · on June 2, 2023

Legend