LinkedInGitHubXEmailResume

created an LLM which captures group conversations as real-time images and diagrams

Timeline

2 weeks

Role

Full Stack Engineer Product Designer Hardware Engineer Industrial Designer

Budget

$300

Team

Failenn Aselta

Tools

Figma · Cursor · Gemini Raspberry Pi · React FastAPI · Linux

Project Overview

Design a handheld device that generates real-time visuals of conversation.

Buddy resolves the disconnect of group work by acting as an intermediary that captures conversations in real time through LLM-powered image generation. Built with rapid prototyping, electronics, and full-stack software development, it preserves discussions as a visual history and saves valuable concepts from being lost to misarticulation.

The Problems

01

40%

Lost Spoken Insights

(MIT, 2022)

02

65%

Claim Poor Visual Retention

(MIT Media Lab)

03

7.5 hrs

Claim Time Lost to Miscommunication

(Huang, 2018)

04

$6.25M

Claim Annual Miscommunication Cost

(Grammarly / Harris Poll, 2023)

Our Goals

01

2s

Real-time Capture

(Whisper.cpp)

02

35%

Faster Mockups

(Forrester TEI, 2024)

03

56%

More Time Saved in Meetings

(Fellow, 2024)

04

$3.5M

Equivalent Money Saved in Meetings

(Grammarly / Harris Poll, 2023)
Illustration: ineffective team collaboration
Diagram: aligning inputs through a shared focal point

Diagrams made with Napkin.ai and Figma

The challenge

One of the largest bottlenecks in design is miscommunication, what if we could create a tool to rectify this issue?

The Research

7.5 hrs per week lost to miscommunication in knowledge work.

Participant

Adam, participant portrait

Adam

31 Years Old

Product Designer

Portrait generated with Gemini

Frustrations

  • Hard to regain alignment when spoken ideas are interpreted differently by each teammate.
  • Little visibility into what was actually agreed on once a working session ends.
  • Good work still feels like it stalls when concepts are lost to misarticulation or memory.

How Might We

Improve group communication by clarifying ideas visually through real-time LLM image generation?

By implementing a technology that helps clarify ideas visually through LLM image generation.

Ideation

Low-Fi Wireframes

Early Drawings

01

Familiar Layout

Zero learning curve.

02

Visual Output

Pictures over text.

03

On-device Audio

Nothing leaves the room.

04

Passive Form

Ambient, never the focus.

Iteration 1

Selected for construction

Dial for changing through iterations, button for on/off on top, and version with an on/off and commit button on the front.

Iteration 3

Send button and an on/off on the front.

Iteration 4

Basic set up with visual as main portion.

Hand-drawn sketches

Full Figma Board

Hi-Fi Mockups

First iteration, prototyped in Figma
Second iteration, running locally, built with React and FastAPI

User Testing

Tested with a group of designers and engineers during a live working session.

Participants used Buddy across a real design critique session. Tasks included: capturing a decision, exporting a session summary, and interpreting a generated diagram.

64%Said the layout was easy to understand but felt extreme to be hardware for such a simple concept.

How do we bring this into a mobile or desktop version to save costs?

40%Said it allowed them to focus on the conversation better but now worried if it was generating correctly.

How can we monitor the images without having to backtrack?

30%Said it helped with miscommunication as they saw what they wanted and the other person.

Could this number get raised if we made the screen larger?

32%Of the time, PDF generation worked perfectly—but it often cut off words.

How do we test the backend more thoroughly?

Some thinkable quotes

“I stopped taking notes halfway through and just focused on the convo.”

“The diagram it made was close but not exactly what we were saying.”

“I'd want this on my phone so I could reference it throughout the day.”

Engineering

How the stack works end to end

Audio is captured and sent to the Whisper API for transcription. The transcript is passed to GPT-4 for interpretation, extracting the core idea from the conversation. Depending on the output type, either Mermaid.js renders a structured diagram or fal.ai generates an image. The backend runs on FastAPI, the frontend on Vite.

Diagrams created with Mermaid.js code and Python in a Figma plugin.

Diagrams made with Mermaid.js (rendered from Python)

system_prompt = f"""
You are a Visual Assistant. You generate Mermaid.js code OR Fal.ai image prompts.

CURRENT MODE: {DIAGRAM | SKETCH} (switch based on intent)

TASK:
1. ANALYZE USER INTENT:
   - Chart, graph, flow, or timeline -> output mode: "DIAGRAM"
   - Scene, photo, texture, or visual style -> output mode: "SKETCH"
   - Referring to "it"/"the image"/"that" -> use CONTEXT HISTORY

2. FOR DIAGRAMS (Mermaid):
   - Return valid Mermaid code only. No backticks.
   - Support: graph TD, mindmap, pie, sequenceDiagram, xychart-beta, gantt.

3. FOR SKETCHES (Images):
   - If refinement, keep core details and apply the change.
   - set "is_refinement": true only if editing the previous image.

Return JSON ONLY:
{ "mode": "DIAGRAM" | "SKETCH", "prompt": "...", "is_refinement": true|false }
"""
# Create a PDF summary to ensure universal accessibility
pdf_buffer = io.BytesIO()
c = canvas.Canvas(pdf_buffer, pagesize=letter)
text_obj = c.beginText(40, 750)
for line in summary_lines:
    text_obj.textLine(line)
c.drawText(text_obj)
c.save()

# Package into a downloadable ZIP artifact
zip_file.writestr("session_summary.pdf", pdf_buffer.getvalue())
return StreamingResponse(
    io.BytesIO(zip_buffer.read()),
    media_type="application/zip",
    headers={"Content-Disposition": "attachment; filename=session_export.zip"}
)

LLM Persona

Had to clearly define the LLM's persona, ultimately assigning it the role of a Visual Assistant for the cleanest outputs.

Image Generation

A major technical hurdle was training the model to generate proper images without relying on explicit keywords.

Session Export

Engineered a session-commit function that dynamically zips all generated assets and transcripts into a universal PDF. Transformed a transient AI conversation into a professional leave-behind artifact.

Hardware

Progress Photos

Building the physical enclosure on a Raspberry Pi, hardware constraints forced architectural clarity that cloud deployment never would have.

Final Product

The Finished Build

Impact

64%

found the layout intuitive on first use

40%

trusted capture enough to stop taking notes

30%

said visuals aligned the room faster than talk alone

Next Steps

  1. 01

    Mobile or desktop form factor. Bring the same layout to phone or desktop software to save costs and right-size the concept away from dedicated hardware.

  2. 02

    Live generation monitor. Surface diagram and image status in session so users trust output without leaving the conversation to verify.

  3. 03

    Larger shared display. Test screen size and placement so both people can read generated visuals without squinting or crowding the device.

  4. 04

    PDF export QA. Add automated export tests and layout checks so truncated words and broken pages are caught before users download.

What I Learned

  1. 1

    Budget Is the Real Constraint

    Budget is the most important constraint in any project. Things you want often have to be sacrificed just to get the product to market—and to earn enough to fund the next version.

  2. 2

    Scale Before You Fabricate

    Scalability should be thought about earlier. It was fun to get hardware running and learn shell scripting, but building an individual handheld for everyone at this size and scale is too expensive.

  3. 3

    Less Is More

    Complex problems often have simple solutions. It is easy to get caught up in complexity and overdesign—in the words of Mies, less is often more.

Considerations

01

Layout & Form Factor

Familiar UIHardware ShellIteration Dial

64%

said the layout was easy to understand but felt extreme as hardware for such a simple concept—mobile or desktop could reduce cost and intimidation.

02

Ambient Capture

Passive ListeningWhisper APILive Generation HUD

40%

said it let them focus on the conversation but worried whether output was generating correctly—monitoring images without backtracking is the next UX pass.

03

Visual Alignment

Diagram Modefal.ai ImagesShared Display

30%

said it reduced miscommunication because both parties saw the same artifact—a larger screen could raise alignment further.

04

Session Export

PDF ExportMermaid.jsFastAPI Backend

32%

of the time, PDF generation worked perfectly but often cut off words—backend layout QA and export tests need to run before every release.

Tradeoff made

The largest tradeoff was not including video. Image-generation APIs are priced per call, and sessions burned through credits quickly—adding live video would have overrun both budget and the two-week sprint. v1 stayed audio → diagram or still image → PDF export; video waits until usage and hardware can support it.

Every hardware decision bent toward keeping the bill of materials under $300. An external battery replaced an internal cell—lithium cost and assembly complexity were harder to justify than a sealed USB pack. The Raspberry Pi was a step down from the RAM target; more memory would have helped concurrent transcription and generation, but unit cost had to win for v1.

Bibliography

Links

  • Arias, Ernesto G., and Gerhard Fischer. "Boundary Objects: Their Role in Articulating the Task at Hand and Making Information Relevant to It." International ICSC Symposium on Interactive and Collaborative Computing. University of Colorado Boulder, 2000. l3d.colorado.edu
  • Brubaker, E. R., S. D. Sheppard, P. J. Hinds, and M. C. Yang. "Objects of Collaboration: Roles of Objects in Spanning Knowledge Boundaries in a Design Company." 34th International Conference on Design Theory and Methodology. MIT, 2022. dspace.mit.edu
  • Huang, Y.-H. "Understanding the Collaboration Difficulties Between UX Designers and Developers in Agile Environments." Masters thesis, Purdue University, 2018. Documents wasted time on rework, revisions, and misaligned handoffs; aligns with ~7.5 hrs/week lost to poor communication (Grammarly / Harris Poll, 2023). docs.lib.purdue.edu
  • Forrester Consulting. "The Total Economic Impact of Figma." Commissioned by Figma, 2024. Composite organization: 35% productivity gain in development; 60% faster ideation and creation workflows. figma.com
  • Grammarly and The Harris Poll. "State of Business Communication." 2023. Knowledge workers lose ~7.5 hours/week to poor communication; firms with 500 employees lose $6.25M/year resolving communication issues. At 56% less meeting waste, equivalent savings ≈ $3.5M/year for the same org size. grammarly.com
  • Fellow.app. "The State of Meetings Report 2024." Survey of how teams run meetings, time spent, and productivity impact. fellow.ai