qDot ([info]qdot) wrote,
@ 2007-07-08 23:27:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Fwiktr v0.2
Fwiktr 0.2.0 is up and running

To start: I really hate the name Fwiktr. If you've got any better ideas, please comment.

The goal of Fwiktr v0.1 was to simply get a picture from flickr based on a twitter message. That goal achieved, it was time to make the interface more extensible.

I've now broken fwiktr into performing certain core functions (called services), and then providing a way to edit the input and output of those functions (called transforms). For those of you who are thinking "hey, this setup looks awful familiar", please stop reading my blogs and work and get back to fixing the grid.

Here's what a current run of fwiktr looks like at the moment

  • Retrieve Post from Public Timeline of Twitter (via Twitter Service)
  • Run cleanup transforms on post (Twitter Post Cleanup Transform - Remove URLs and specific post service information, like @[name] symbols for twitter directed messages)
  • Run post through language parser (Lingua::Identify Service - Right tool for the job, even if it is the wrong language :) )
  • Using returned language, choose proper TreeTagger implementation and run Parts of Speech Marking on post (Treetagger ENGLISH - Nouns Only Transform)
  • Get pictures using ALL search mode for flickr API (Flickr Full AND Transform)
    • If zero pictures returned, use ANY search mode for flickr API (Flickr Fuck It Transform)


All of this is compiled into an XML block which is then shipped to the remote web DB. Instead of trying to wrestle with a generalized SQL schema, I decided to send data in XML form as it is easily mutable and still fairly backwards compatible as long as the DTD is kept generalized. Not to mention, it's less code I have to write at the moment, and I'm more interested in making art than in making a searchable database for the time being (and thanks to the DTD, turning the data into a DB shouldn't be much of a problem at all once I want to do that).

Aside: The XML block is now available for outside parsing, just add "&xml=1" to the end of the art request URL, but I'll warn you, it's a complete mess right now, there's many different formats (that at least all follow the same DTD, but that's it) due to having the DB persist through development of v0.2. I'm gonna write a script to clean things up, but that may be a ways off)

The really nice thing about this setup is that every level (the post retrieval, picture retrieval, language identification, and transform layers) are all completely pluggable, while still providing a fairly robust pipeline for the task. I am very much interested in writing my own parts of speech tagger, language identification algorithms, and other things, but for right now, they work on proven software, and produce output that I can compare my own implementations against once I get around to that.

Which may very well be never.

At this point, I'm pretty happy to let fwiktr run as is for a while. Before I completely call phase 1 finished, I'd like to have an RSS feed, and /maybe/ start taking post requests (i.e. you can hand it a twitter URL and it'll parse it for you).

Here's the plans for the future:

Phase 2
  • Implement small, quick user system (maybe using OpenID, 'cause that seems neat, but I'll probably just try to hook into some prebuilt simple CMS, which scares me, 'cause CMS selection is a bitch these days) to allow ratings, favorites, etc... - this is very important for AI training that I can use later
  • Create simple DB import utility to make data more accessible


Phase 3
  • Implement art overlay rendering (pasting the message back over the picture
  • Implement seasoning system
    • The "seasoning" system is something that I'm rather looking forward to implementing. Right now, we're throwing away all URLs and emoticons. However, these are completely valid and useful parts of a message which could be used to seed more tags into a search. The emoticons could be translated into moods (though I'd have to do some statistical analysis on tags first to see if anyone actually uses that), while URLs could be descended into and scraped for some sort of context which could be fed back into the message. There's other metadata that comes along with the message that this idea could be used with too (i.e. using location plus post time to pull weather modifiers from some weather reporting service, adding "clouds" or "sun" or "rain" as tags.


Phase 4
  • Start phasing out external NLP systems in favor of hand written ones
    • I'm leaving this step until I have all the features in as I want them, otherwise I'll just get pissed off and quit and leave the project in a half finished, non-working state while I try implementing something someone else has already done. Much better to have a working model to compare against,too.


But, for right now, this is the end of this phase of fwiktr. I've got a couple of other quick one-off projects I want to work on which I'm sure will give me more ideas for this one, and at the rate this turns out pictures, I'll have lots of fun data to play with when I get back to it.



(Post a new comment)


[info]prickvixen
2007-07-09 06:51 am UTC (link)
Twit-flick? Dwarf toss?

(Reply to this) (Thread)


[info]qdot
2007-07-09 06:55 am UTC (link)
Wow. Dwarf Toss. Post #1 and we may already have a winner.

(Reply to this) (Parent)(Thread)


[info]prickvixen
2007-07-09 07:04 am UTC (link)
Not that there's any etymological connection between twit-flick and dwarf toss, except that they're vaguely similar constructions. Which is nevertheless the level of quality you can expect from the average marketing group when looking for a product name...

(Reply to this) (Parent)(Thread)


[info]qdot
2007-07-09 07:26 am UTC (link)
Well, I'm trying to get away from the whole twitter/flickr/web2.0 in general thing, since this will be branching out to encompass many more post/picture sources.

And really, can anything bad come of something called "Dwarf Toss"?

(Reply to this) (Parent)(Thread)


[info]prickvixen
2007-07-09 07:40 am UTC (link)
Well, insensitivity to a minority group, but who cares about THEM

(Reply to this) (Parent)


[info]alteredhistory
2007-07-09 02:48 pm UTC (link)
> And really, can anything bad come of something called "Dwarf Toss"?

Except it isn't the dwarf you're tossing. The flickr image is being tossed to twitter, so the sense of it is backwards for the software. Dwarf Toss would work for twitter->flickr, but you're working on flickr->twitter.

There's the easy compounding of "flittr", but it doesn't excite me. A name like Dwarf Toss is the right idea.

flick->twit ... "Photo Jerk" ?

(Reply to this) (Parent)


[info]yetanotherbob
2007-07-09 03:56 pm UTC (link)
Aw. I am full of fail. I was going to suggest "Hummingbird", as it's something that twitters, and also needs a good camera to capture right because it goes so fast. But that's more of a 80s/90s-ish desktop app name than a web2.0 thing. Although I suppose you could just tack on "Beta" at the end, and it'd be web2.0-y.

(Reply to this)


[info]_box_spring_hog
2007-07-11 08:14 pm UTC (link)
simplify the interface and thats my new thing to leave up on my comp when Im not working...thats about the coolest shit ever.

(Reply to this) (Thread)


[info]qdot
2007-07-11 08:31 pm UTC (link)
So what exactly are you looking for in a simplified interface? Just the picture and the caption?

(Reply to this) (Parent)


[info]_box_spring_hog
2007-07-12 09:06 pm UTC (link)
Yeah, just the picture and the caption...it might be fun to alter what pieces of the interface you want to show...or put them below the full screen cutoff, so you just scroll down to change the settings. It somewhat reminds me of 5 card nancy...in a weird way. Thanks for making sompn cool.

(Reply to this)


Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…