I have been following this thread with interest. You stated your goal in one of the posts.
Goal is small Pin (and large Pout in closed "OUT" portion, with Pout dumped on [R-D or C-O] =load).
With this in mind, as an old hand at switchmode design, can I ask a question and make a few recommendations ?
1) In light of your goal, what function does Rr perform? You state that it is used as a current limit. Considering this, it will certainly waste power.
2) Consider operating at higher Vin, where semiconductor Vdrop will be less significant. You will need to readjust bias.
3) Use a good quality Schottky diode rather than LED. Determine the sweet spot value for the output load resistor. The idea is to keep current through the diode low so as not to waste power there, unless you want to calculate that power and use it as part of the output consideration, even though it is unusable except as a heater.
4) With the output capacitor connected, there will be high peak currents in the diode. Best to not use it unless you need a filtered output. A small choke will limit peak current if it Cout must be used. Consider synchronous rectification for lowest loss in the rectifier device.
5) Readjust turns ratio so that power is not wasted in unnecessary base drive. There is nothing magic about a 1:1 ratio. Personally I like to use the heaviest gauge wire I can for the main winding, while leaving enough room on the core for just enough turns of a thinner gauge to sustain oscillation. Since any decent transistor will have a current gain of at least 100, keep the ohmic losses in the main winding as low as possible.
6) If you can operate above 5 volts, consider using a FET instead of BJT, as the drive requirements will be lower.
7) If size is not a problem, use a larger core with same number of turns. The operating frequency will be lower, but this will mean less switching loss overall. Generally if you can keep the operating frequency low, core loss and transition loss will also be reduced. Judging from the waveforms in "try3" scope shot, I would hazard a guess that you have lots of switching loss from the rise and fall slopes.
Finally, I see nothing magic about the common collector configuration vs. the common emitter configuration of a standard blocking oscillator. The only slight advantage may be that base drive also appears in the inductor ramp up current, but this is two orders of magnitude less important than other lossy circuit problems.
Those are my first cut suggestions. While some of these suggestions are more applicable to power supply design in the 10's of Watts range, if you are really trying to squeeze the last bit of efficiency, they can be good "rules of thumb".
You may wish to restate your goal if the above has violated any constraints.
Where there are constraints, engineering involves compromise.
If the only goal is maximum efficiency, no holds barred, we have a lot of room to redesign.
If you must stay with that particular core, operating voltage, switching frequency and circuit configuration, then we can only make minor gains in efficiency.
Just because it has a patent application or is patented does not always mean it really works.