On Sat, 15 Mar 2008 02:36:29 -0700, Ahmad Jalil Qarshi
[quoted text, click to view] <ahmaddearNO@SPAMhotmail.com> wrote:
> Thanks Peter & Jon,
>
> I have to read the file on user request. Now the problem is that I can't
> create index as the format of file is fixed. So I think the only option
> left
> is binary search.
Well, a linear scan is always an option. Whether that performs well
enough to justify avoiding the work of implementing a binary search,
that's up to you or whoever's making the decision about how your time is
best used.
If the text file uses an encoding that has fixed-sized characters, I think
that a binary search isn't actually going to be very hard to implement.
But as Jon points out, if you have to support multiple encodings and/or
encodings with variable-length characters (e.g. UTF-8), then it gets more
complicated (a little...I'm no character-encoding expert, but I'm under
the impression that in any of the encodings, finding an end-of-line marker
while scanning backwards would not be difficult; a LF or CR/LF pair isn't
a valid sequence in any other character's data, is it?).
In either case, it's certainly possible. The question is what's the best
use of your time.
I still wonder at the requirement that this has to be done "on demand",
without an opportunity to index the file. That's an awful lot of data to
be search just once. But if "the powers that be" deem it more sensible to
make you implement a binary search on a variable-line-length text file
than to fix the problem space to better incorporate an indexing- or
database-based solution, I guess that's what you're stuck with.
I have to admit: if these happen infrequently enough that indexing a file
doesn't make sense, I wonder if they don't also happen infrequently enough
that just scanning linearly through the file wouldn't be okay. Why does
this need to be fast?
[quoted text, click to view] > Let me explain one more thing which I missed in my previous post. The
> numeric value before the [space] can be repeated thousands of time
> consequtively. [...]
>
> Now if I have to find the first occurance of a numeric value 2345. In
> that
> case binary search will help or not?
It will help just as much as it would in any other situation where a
binary search can be used. Binary search doesn't imply unique elements,
but if you have duplicated elements that does mean that you have to do
some extra work to scan and find the first matching element, since the
first element the search finds may not be the actual first element in the
sequence.